[proofplan]
Fix the value $x$ and regard the intervention $do(X=x)$ as producing a new structural causal model whose graph is $G_{\overline X}$. The key distinction is between the components of $Z$ that are ancestors of the conditioned variables $W$ after $do(X=x)$ and those that are not; only the latter form $Z(W)$ and have their incoming arrows deleted in the graphical test. The [truncated factorization formula](/theorems/9677) identifies the post-intervention laws with products of the same structural kernels except at intervened nodes, and the assumed d-separation gives the conditional independence needed to remove the assigned value $z$ from the conditional kernel of $Y$ given $W=w$. This proves equality of the two regular conditional probabilities on every measurable event for which the chosen versions are defined.
[/proofplan]
[step:Pass to the post-$X$ intervention model]
Fix an intervention value $x\in\mathcal X_X$ of $X$. Let $M_x$ denote the structural causal model obtained from $M$ by replacing the structural assignments for the variables in $X$ by the constants $x$. Its causal DAG is
\begin{align*}
H := G_{\overline X}.
\end{align*}
For any further intervention value $z\in\mathcal X_Z$ of $Z$, the law $P_{x,z}$ is the interventional law of $M_x$ under the additional intervention $do(Z=z)$, while $P_x$ is the observational law of $M_x$.
Define
\begin{align*}
Z_0 := Z(W) = Z \setminus \operatorname{An}_{H}(W)
\end{align*}
and
\begin{align*}
Z_1 := Z \cap \operatorname{An}_{H}(W).
\end{align*}
Thus $Z=Z_0\cup Z_1$ and $Z_0\cap Z_1=\varnothing$. Write $z_0\in\mathcal X_{Z_0}$ and $z_1\in\mathcal X_{Z_1}$ for the restrictions of the intervention value $z\in\mathcal X_Z$ to the coordinates $Z_0$ and $Z_1$. The graph in the hypothesis is
\begin{align*}
H_{\overline{Z_0}} = G_{\overline X\,\overline{Z(W)}}.
\end{align*}
The assumption is therefore that $Y$ and $Z$ are d-separated by $X\cup W$ in $H_{\overline{Z_0}}$. Since $X$ is fixed in $M_x$, this is equivalently the d-separation statement that, in the post-$X$ graph with incoming arrows into $Z_0$ deleted, $Y$ is separated from $Z$ after conditioning on $W$ and on the fixed treatment coordinates $X=x$.
[/step]
[step:Use the truncated factorization to express the two laws]
Let $V=\{V_1,\dots,V_n\}$ be ordered topologically in $G$, and let $(\mathcal X_{V_i},\mathcal E_{V_i})$ be the standard Borel state space of $V_i$. For each node $V_i$, let
\begin{align*}
K_i: \mathcal X_{\operatorname{pa}_G(V_i)} \to \mathcal P(\mathcal X_{V_i})
\end{align*}
denote the regular [conditional probability](/page/Conditional%20Probability) kernel induced by the structural assignment of $V_i$ in $M$, where $\mathcal P(\mathcal X_{V_i})$ is the space of probability measures on $(\mathcal X_{V_i},\mathcal E_{V_i})$. The state spaces are standard Borel and the exogenous variables in the Markovian model are jointly independent, so the observational law factorizes according to these kernels.
By the truncated factorization formula [citetheorem:9677], the post-$X$ law $P_x$ is obtained by deleting the kernels for the nodes in $X$ and evaluating all children of $X$ at the fixed value $x$. Similarly, $P_{x,z}$ is obtained from the same factorization by additionally deleting the kernels for the nodes in $Z$ and evaluating their children at the assigned value $z$. Thus $P_x$ and $P_{x,z}$ differ only through the replacement of the natural mechanisms for the variables in $Z$ by constants.
[guided]
We now translate the intervention notation into ordinary kernels. This is the point of the truncated factorization formula [citetheorem:9677]. For each topologically ordered node $V_i$, let
\begin{align*}
K_i: \mathcal X_{\operatorname{pa}_G(V_i)} \to \mathcal P(\mathcal X_{V_i})
\end{align*}
be the regular conditional probability kernel for the node $V_i$ given its parents in the original causal DAG. The hypotheses required for the truncated factorization formula are present: $G$ is a finite causal DAG, $M$ is compatible with $G$, the exogenous variables are mutually independent, and the relevant regular conditional kernels exist on standard Borel state spaces. Hence the joint law generated by $M$ factorizes as a product of the nodewise kernels $K_i$.
Under the intervention $do(X=x)$, the structural equations for the nodes in $X$ are replaced by constants. The truncated factorization formula [citetheorem:9677] says that the product keeps the kernels for all non-intervened nodes, removes the kernels for the intervened nodes in $X$, and inserts the fixed value $x$ into the parent arguments of children of $X$. Therefore the law obtained is $P_x$ and its graph is $G_{\overline X}$.
Under the further intervention $do(Z=z)$, the same operation is performed for the nodes in $Z$. Thus $P_{x,z}$ is obtained from the factorization for $P_x$ by removing the natural kernels for $Z$ and inserting the assigned value $z$. With the decomposition $Z=Z_0\cup Z_1$, this means that the variables in $Z_0$ are set to $z_0$ and the variables in $Z_1$ are set to $z_1$, where $z_0$ and $z_1$ are the restrictions of $z$ to the coordinates $Z_0$ and $Z_1$. The remaining work is therefore to justify that this replacement of the $Z$-mechanisms does not change the conditional law of $Y$ inside the stratum $W=w$.
[/guided]
[/step]
[step:Apply soundness of do-calculus to the rule-three separation]
Consider the mutilated graph
\begin{align*}
H' := H_{\overline{Z_0}} = G_{\overline X\,\overline{Z(W)}}.
\end{align*}
The hypothesis is exactly the graphical premise for Pearl's third do-calculus rule in this graph: $Y$ and $Z$ are d-separated by $X\cup W$ in $H'$. By the soundness of do-calculus [citetheorem:9683], applied to the graph-surgery semantics of $M$, the corresponding deletion-of-actions transformation is valid for the regular conditional kernels on their common domains of definition. Therefore, for every measurable set $A\in\mathcal E_Y$ and every pair $(z,w)$ at which the displayed kernels are defined,
\begin{align*}
P_{x,z}(Y\in A\mid W=w)=P_x(Y\in A\mid W=w).
\end{align*}
This is the asserted eventwise equality of conditional probability kernels.
[guided]
The previous attempted argument tried to replace interventions by ordinary conditioning one coordinate at a time. That is not a valid standalone move: conditioning on a naturally generated variable and intervening to set that variable are different operations. The theorem statement therefore invokes the precise soundness assumption needed here, namely the soundness theorem for Pearl's do-calculus under structural graph-surgery semantics.
We verify that its graphical premise is the present premise. After fixing $x\in\mathcal X_X$, the graph for the post-$X$ intervention is
\begin{align*}
H := G_{\overline X}.
\end{align*}
The set of action variables whose incoming arrows are deleted for Rule Three is
\begin{align*}
Z_0 := Z(W)=Z\setminus \operatorname{An}_{H}(W).
\end{align*}
Thus the relevant mutilated graph is
\begin{align*}
H_{\overline{Z_0}}=G_{\overline X\,\overline{Z(W)}}.
\end{align*}
The theorem assumes exactly that $Y$ and $Z$ are d-separated by $X\cup W$ in this graph. Since $M$ is assumed to satisfy the usual graph-surgery semantics, the truncated factorization formula for the relevant interventions, and the do-calculus soundness theorem on the chosen regular conditional probability kernels, [citetheorem:9683] gives the deletion-of-actions identity
\begin{align*}
P_{x,z}(Y\in A\mid W=w)=P_x(Y\in A\mid W=w)
\end{align*}
for every measurable $A\in\mathcal E_Y$ and every value pair $(z,w)$ where both regular conditional probabilities are defined. This proves the substantive intervention step without confusing intervention with observation.
[/guided]
[/step]
[step:Read the event equality as the point-mass or density equality]
The preceding step proves equality of the conditional probability kernels on every measurable event $A$ in the state space of $Y$. If the outcome law is discrete and $A=\{y\}$, this gives
\begin{align*}
P_{x,z}(Y=y\mid W=w)=P_x(Y=y\mid W=w).
\end{align*}
If the conditional laws admit densities with respect to a common dominating measure, equality of the conditional probability kernels implies equality of the corresponding densities for almost every outcome value $y$ with respect to that dominating measure. This is the stated shorthand
\begin{align*}
P_{x,z}(y\mid w)=P_x(y\mid w),
\end{align*}
whenever both sides are defined. The theorem follows.
[/step]