[proofplan]
The proof is an unpacking of modularity for structural causal models. First we label the observational factors by their corresponding node mechanisms. Under the intervention $do(X=x_I)$, the factors for intervened nodes are replaced by degenerate constants, while the non-intervened factors are unchanged except that any intervened parent value is fixed at $x_I$. Removing the degenerate factors gives the density or mass function on $V\setminus X$, and marginalization over the variables outside $Y$ gives the final formula.
[/proofplan]
custom_env
admin
[step:Represent the observational law by nodewise structural factors]
For each index $i\in\{1,\dots,n\}$, let
\begin{align*}
p_i:E_i\times\prod_{j\in\operatorname{pa}(i)}E_j\to[0,\infty)
\end{align*}
denote the conditional density or mass function of $V_i$ given its parent variables $V_{\operatorname{pa}(i)}$ with respect to $\mu_i$. By hypothesis, the observational joint density or mass function $p:\prod_{i=1}^nE_i\to[0,\infty)$ satisfies
\begin{align*}
p(v_1,\dots,v_n)=\prod_{i=1}^n p_i(v_i\mid v_{\operatorname{pa}(i)}).
\end{align*}
Compatibility of the structural causal model with $G$ means that the structural mechanism for $V_i$ depends only on the variables indexed by $\operatorname{pa}(i)$ and on its own exogenous input. By the theorem statement, the displayed factor $p_i(v_i\mid v_{\operatorname{pa}(i)})$ is the density or mass function of that node's structural transition kernel, and the observational factorization is the product of these nodewise structural kernels in topological order.
[/step]
custom_env
admin
[step:Replace intervened mechanisms by point masses]Let $I=\{i:V_i\in X\}$ and let $I^c=\{1,\dots,n\}\setminus I$. Under $do(X=x_I)$, the mechanism for each $V_i$ with $i\in I$ is replaced by the constant assignment $V_i=x_i$. Equivalently, the conditional factor for such an index is replaced by the Dirac kernel $\delta_{x_i}$ on $(E_i,\mathcal E_i)$, defined by $\delta_{x_i}(A)=1$ if $x_i\in A$ and $\delta_{x_i}(A)=0$ if $x_i\notin A$ for every $A\in\mathcal E_i$.
For each $i\notin I$, modularity of the structural causal model says that the mechanism for $V_i$ is unchanged. Therefore its conditional density or mass function remains $p_i(v_i\mid v_{\operatorname{pa}(i)})$, with the convention that whenever a parent index $j\in\operatorname{pa}(i)$ also belongs to $I$, the parent value $v_j$ is fixed to the assigned value $x_j$.
Hence the joint post-intervention density or mass function on the variables $V_{I^c}$ is
\begin{align*}
p_{x_I}(v_{I^c})=\prod_{i\notin I}p_i(v_i\mid v_{\operatorname{pa}(i)})\big|_{v_I=x_I}.
\end{align*}[/step]
custom_env
admin
[guided]The intervention $do(X=x_I)$ changes mechanisms, not merely conditioning events. The index set $I=\{i:V_i\in X\}$ records exactly which node mechanisms are replaced. For an intervened node $V_i$ with $i\in I$, the original factor $p_i(v_i\mid v_{\operatorname{pa}(i)})$ is no longer used, because the intervention forces $V_i$ to equal $x_i$ independently of its parents.
For a non-intervened node $V_i$ with $i\notin I$, the modularity assumption says that its structural mechanism is left unchanged. Therefore the same conditional factor $p_i(v_i\mid v_{\operatorname{pa}(i)})$ appears after intervention. The only notational issue is that some parents of $V_i$ may themselves be intervened variables. If $j\in\operatorname{pa}(i)\cap I$, then the value supplied to the factor for the parent $V_j$ is not a random coordinate of $v_{I^c}$; it is the fixed intervention value $x_j$.
Thus each surviving factor is the original observational factor evaluated after substituting $v_j=x_j$ for every intervened parent. Multiplying over all non-intervened nodes gives
\begin{align*}
p_{x_I}(v_{I^c})=\prod_{i\notin I}p_i(v_i\mid v_{\operatorname{pa}(i)})\big|_{v_I=x_I}.
\end{align*}
The factors for indices in $I$ are absent because those variables are no longer random coordinates in the distribution of $V\setminus X$; they have been fixed by the intervention.[/guided]
custom_env
admin
[step:Identify the truncated product as the distribution of the remaining variables]
Define the truncated product map
\begin{align*}
q_{x_I}:\prod_{i\notin I}E_i\to[0,\infty)
\end{align*}
by
\begin{align*}
q_{x_I}(v_{I^c})=\prod_{i\notin I}p_i(v_i\mid v_{\operatorname{pa}(i)})\big|_{v_I=x_I}.
\end{align*}
For each $i\notin I$, the factor $p_i(\cdot\mid v_{\operatorname{pa}(i)})$ is a conditional density or mass function with respect to $\mu_i$, after the parent values in $I$ have been fixed at $x_I$. The theorem statement assumes these structural kernels are defined and normalized for the parent values produced by this substitution, including intervention values that may have observational probability zero. Since the variables are indexed in a topological order and the intervened model is recursive, integrating or summing the product successively in reverse topological order over the coordinates in $I^c$ removes one normalized conditional factor at each stage. Hence $q_{x_I}$ has total integral or total mass equal to $1$ with respect to $\mu_{I^c}=\bigotimes_{i\notin I}\mu_i$.
Thus $q_{x_I}$ defines the density or mass function generated by the intervened structural model on $\prod_{i\notin I}E_i$. The factors for $i\in I$ have been truncated because the corresponding variables are fixed by the intervention and are not integrated against $\mu_i$ in the post-intervention law on $V_{I^c}$. Therefore
\begin{align*}
p_{x_I}(v_{I^c})=q_{x_I}(v_{I^c})=\prod_{i\notin I}p_i(v_i\mid v_{\operatorname{pa}(i)})\big|_{v_I=x_I}.
\end{align*}
This proves the displayed truncated factorization formula for the joint distribution of $V\setminus X$.
[/step]
custom_env
admin
[step:Marginalize the truncated product to obtain the distribution of $Y$]
Let $S=\{i:V_i\in Y\}\subseteq I^c$. Define the product measure on the variables in $I^c\setminus S$ by
\begin{align*}
\mu_{I^c\setminus S}=\bigotimes_{j\in I^c\setminus S}\mu_j.
\end{align*}
By the definition of a marginal density or mass function, the interventional marginal of $V_S$ is obtained by integrating the joint post-intervention density over the coordinates indexed by $I^c\setminus S$. Hence
\begin{align*}
p_{x_I}(v_S)=\int_{\prod_{j\in I^c\setminus S}E_j}p_{x_I}(v_{I^c})\,d\mu_{I^c\setminus S}(v_{I^c\setminus S}).
\end{align*}
Substituting the truncated product formula already proved gives
\begin{align*}
p_{x_I}(v_S)=\int_{\prod_{j\in I^c\setminus S}E_j}\prod_{i\notin I}p_i(v_i\mid v_{\operatorname{pa}(i)})\big|_{v_I=x_I}\,d\mu_{I^c\setminus S}(v_{I^c\setminus S}).
\end{align*}
This is exactly the stated formula for the interventional marginal density or mass function $p_{x_I}(v_S)$ of $Y$, with sums replacing the integral when the relevant dominating measures are counting measures. The proof is complete.
[/step]