[proofplan]
We prove the discrete formula; the kernel formula is the same argument with sums replaced by integration against regular conditional laws. The proof expands the post-intervention outcome law by mediator values, identifies the post-intervention mediator distribution using the absence of back-door paths from $A$ to $M$, and then identifies the outcome term by two applications of do-calculus. Finally, we expand the single-mediator intervention law over the observational treatment distribution and use the front-door blocking condition from $M$ to $Y$ given $A$ to obtain the displayed functional of the observational law.
[/proofplan]
[step:Expand the intervention law over mediator values]
Fix $a\in\mathcal A$ and $B\in\mathcal E_{\mathcal Y}$. For each $m\in\mathcal M$, let
\begin{align*}
p_a^M(m):=\mathbb P(M=m\mid do(A=a))
\end{align*}
denote the interventional mass function of $M$ under $do(A=a)$. By the [law of total probability](/theorems/1113) under the interventional law $\mathbb P(\,\cdot\mid do(A=a))$,
\begin{align*}
\mathbb P(Y\in B\mid do(A=a))=\sum_{m\in\mathcal M}\mathbb P(Y\in B\mid do(A=a),M=m)p_a^M(m).
\end{align*}
The conditional probabilities in the sum are interpreted only for mediator values with $p_a^M(m)>0$; terms with $p_a^M(m)=0$ contribute zero.
[/step]
[step:Identify the mediator distribution under treatment intervention]
For each $m\in\mathcal M$, the second front-door clause says that there is no unblocked back-door path from $A$ to $M$. Equivalently, in the graph relevant to exchanging intervention on $A$ with observation of $A$, the do-calculus Rule Two separation condition holds for $M$ and $A$. By [Pearl's Rule Two](/theorems/9679) [citetheorem:9679], justified for the present structural causal model by soundness of do-calculus [citetheorem:9683],
\begin{align*}
\mathbb P(M=m\mid do(A=a))=\mathbb P(M=m\mid A=a).
\end{align*}
The positivity hypothesis ensures that the observational [conditional probability](/page/Conditional%20Probability) on the right is defined whenever it appears in the formula.
[/step]
[step:Replace conditioning on the mediator under $do(A=a)$ by intervening on the mediator]
Fix $m\in\mathcal M$ with $\mathbb P(M=m\mid do(A=a))>0$. In the graph with incoming arrows into $A$ removed and outgoing arrows from $M$ deleted, the first front-door clause removes directed influence from $A$ to $Y$ after conditioning on $M$, while the third front-door clause supplies the blocking of back-door paths from $M$ to $Y$ once $A$ is fixed. Thus the Rule Two graphical condition holds for exchanging observation of $M=m$ with intervention $do(M=m)$ in the law already conditioned on $do(A=a)$. Pearl's Rule Two [citetheorem:9679], together with do-calculus soundness [citetheorem:9683], gives
\begin{align*}
\mathbb P(Y\in B\mid do(A=a),M=m)=\mathbb P(Y\in B\mid do(A=a),do(M=m)).
\end{align*}
[guided]
We are trying to identify the factor $\mathbb P(Y\in B\mid do(A=a),M=m)$, but this factor still mixes an intervention on $A$ with an observation of the mediator. The front-door strategy turns the observed mediator value into an intervention on the mediator, because the later step can identify the effect of directly setting $M=m$.
The relevant rule is Pearl's Rule Two [citetheorem:9679]. In the present notation, Rule Two permits the replacement of observing $M=m$ by intervening to set $M=m$ when the required separation holds in the graph where incoming arrows into already intervened variables are removed and outgoing arrows from the variable being exchanged are deleted. Here the already intervened variable is $A$, and the variable being exchanged is $M$.
We verify the graphical condition from the front-door assumptions. First, every directed path from $A$ to $Y$ is intercepted by $M$, so once the outgoing arrows from $M$ are deleted, no directed causal route through $M$ can carry information from the observed value of $M$ to $Y$ inside the $do(A=a)$ graph. Second, every back-door path from $M$ to $Y$ is blocked by $A$, and $A$ is fixed in the conditional regime. Therefore the required d-separation condition for Rule Two is satisfied in the relevant mutilated graph. Since the theorem assumes graph-surgery semantics and do-calculus soundness, Soundness of Do-Calculus [citetheorem:9683] allows the graphical separation to be translated into the probabilistic equality
\begin{align*}
\mathbb P(Y\in B\mid do(A=a),M=m)=\mathbb P(Y\in B\mid do(A=a),do(M=m)).
\end{align*}
[/guided]
[/step]
[step:Delete the treatment intervention after the mediator has been set]
In the graph with incoming arrows into both $A$ and $M$ removed, the first front-door clause implies that every directed path from $A$ to $Y$ is intercepted by $M$, and the intervention $do(M=m)$ cuts the incoming arrows into $M$. Hence, after conditioning on the intervention $do(M=m)$, $Y$ is d-separated from $A$ in the graph required by Pearl's Rule Three. By Pearl's Rule Three [citetheorem:9680] and do-calculus soundness [citetheorem:9683],
\begin{align*}
\mathbb P(Y\in B\mid do(A=a),do(M=m))=\mathbb P(Y\in B\mid do(M=m)).
\end{align*}
Combining this with the previous step gives
\begin{align*}
\mathbb P(Y\in B\mid do(A=a),M=m)=\mathbb P(Y\in B\mid do(M=m)).
\end{align*}
[/step]
[step:Identify the single-mediator intervention law by averaging over treatment]
For fixed $m\in\mathcal M$, define
\begin{align*}
q_m(B):=\mathbb P(Y\in B\mid do(M=m)).
\end{align*}
By the law of total probability under $\mathbb P(\,\cdot\mid do(M=m))$,
\begin{align*}
q_m(B)=\sum_{a'\in\mathcal A}\mathbb P(Y\in B\mid do(M=m),A=a')\mathbb P(A=a'\mid do(M=m)).
\end{align*}
Because $G$ is acyclic, there is no directed path from $M$ back to $A$. The graph surgery for $do(M=m)$ changes only the structural mechanism for $M$ and does not change the structural mechanisms upstream of $M$. Therefore the marginal law of $A$ is unchanged:
\begin{align*}
\mathbb P(A=a'\mid do(M=m))=\mathbb P(A=a').
\end{align*}
For each $a'\in\mathcal A$ with $\mathbb P(A=a')>0$, the third front-door clause says that $A$ blocks every back-door path from $M$ to $Y$. Hence the Rule Two condition holds for exchanging $do(M=m)$ with the observation $M=m$ inside the stratum $A=a'$. Pearl's Rule Two [citetheorem:9679], justified by soundness [citetheorem:9683], gives
\begin{align*}
\mathbb P(Y\in B\mid do(M=m),A=a')=\mathbb P(Y\in B\mid M=m,A=a').
\end{align*}
Substituting these two identities into the total-probability expansion yields
\begin{align*}
\mathbb P(Y\in B\mid do(M=m))=\sum_{a'\in\mathcal A}\mathbb P(Y\in B\mid M=m,A=a')\mathbb P(A=a').
\end{align*}
[/step]
[step:Substitute the identified factors into the mediator expansion]
From the mediator expansion and the two identifications above,
\begin{align*}
\mathbb P(Y\in B\mid do(A=a))=\sum_{m\in\mathcal M}\mathbb P(M=m\mid A=a)\mathbb P(Y\in B\mid do(M=m)).
\end{align*}
Replacing the single-mediator intervention term by the formula obtained in the previous step gives
\begin{align*}
\mathbb P(Y\in B\mid do(A=a))=\sum_{m\in\mathcal M}\mathbb P(M=m\mid A=a)\sum_{a'\in\mathcal A}\mathbb P(Y\in B\mid M=m,A=a')\mathbb P(A=a').
\end{align*}
This is exactly the claimed front-door functional of the observational law. In the standard-Borel case, the same proof is read with $K_M(\,\cdot\mid a)$ and $K_Y(\,\cdot\mid m,a')$ in place of the displayed conditional mass functions, and the sums are replaced by integration with respect to those kernels and the observational law $\mathbb P_A$.
[/step]