[proofplan]
The statement is a closure soundness result: once each displayed do-calculus rule instance is known to be a semantic equality of regular conditional kernels, any finite formal derivation using those instances and valid probability identities remains semantically valid. The proof first fixes the almost-everywhere equality relation for regular conditional kernels, then records that each allowed rule instance is one of the semantically valid equalities assumed in the statement. The main argument is an induction on the length of the derivation, with finite unions of null exceptional sets preserving the kernel-equality convention. The latent-variable assertion is proved by the same induction after the statement's mixed-graph semantic-validity hypothesis has already absorbed the nontrivial latent-projection and marginalization facts.
[/proofplan]
custom_env
admin
[step:Fix the kernel equality convention used throughout the derivation]For any $A\subset V$, let
\begin{align*}
\pi_A:E_V\to E_A
\end{align*}
denote the coordinate projection. If $Q$ is a probability law on $(E_V,\mathcal E_V)$ and $B,C\subset V$, a regular conditional law of $\pi_B$ given $\pi_C$ under $Q$ is a Markov kernel
\begin{align*}
K_{B\mid C}^Q:E_C\times \mathcal E_B\to [0,1]
\end{align*}
satisfying, for every $A_B\in\mathcal E_B$ and $A_C\in\mathcal E_C$,
\begin{align*}
Q(\pi_B\in A_B,\pi_C\in A_C)=\int_{A_C}K_{B\mid C}^Q(c,A_B)\,dQ_C(c),
\end{align*}
where $Q_C:=Q\circ \pi_C^{-1}$ is the marginal law of $\pi_C$.
When we write
\begin{align*}
K_{B\mid C}^Q=K_{B\mid C}^{Q'}
\end{align*}
relative to a specified conditioning law $\mu$ on $(E_C,\mathcal E_C)$, we mean that there exists $N\in\mathcal E_C$ with $\mu(N)=0$ such that
\begin{align*}
K_{B\mid C}^Q(c,A_B)=K_{B\mid C}^{Q'}(c,A_B)
\end{align*}
for every $c\in E_C\setminus N$ and every $A_B\in\mathcal E_B$. This is the [equivalence relation](/page/Equivalence%20Relation) used in every step below.[/step]
custom_env
admin
[guided]The statement compares conditional distributions, so the first task is to make precise what the word "equal" means. For a subset $A\subset V$, define the coordinate projection
\begin{align*}
\pi_A:E_V\to E_A
\end{align*}
by sending a full configuration of all variables to its coordinates indexed by $A$. If $Q$ is any observational or interventional probability law on $E_V$, then a regular conditional law of $\pi_B$ given $\pi_C$ under $Q$ is a Markov kernel
\begin{align*}
K_{B\mid C}^Q:E_C\times \mathcal E_B\to [0,1]
\end{align*}
such that, for every measurable outcome event $A_B\in\mathcal E_B$ and every measurable conditioning event $A_C\in\mathcal E_C$,
\begin{align*}
Q(\pi_B\in A_B,\pi_C\in A_C)=\int_{A_C}K_{B\mid C}^Q(c,A_B)\,dQ_C(c),
\end{align*}
where $Q_C:=Q\circ\pi_C^{-1}$ is the law of the conditioning variable.
The point of this definition is that regular conditional kernels are unique only up to null sets in the conditioning variable. Therefore, when two kernels are written as equal, we do not require literal pointwise equality at every conditioning value. Relative to a specified law $\mu$ on $E_C$, the equality
\begin{align*}
K_{B\mid C}^Q=K_{B\mid C}^{Q'}
\end{align*}
means that there is a measurable null set $N\in\mathcal E_C$ with $\mu(N)=0$ such that
\begin{align*}
K_{B\mid C}^Q(c,A_B)=K_{B\mid C}^{Q'}(c,A_B)
\end{align*}
for every $c\in E_C\setminus N$ and every measurable set $A_B\in\mathcal E_B$. This is exactly the almost-everywhere convention in the theorem statement, and it is the convention preserved by all subsequent rule applications and probability identities.[/guided]
custom_env
admin
[step:Record the semantic rule instances available in the DAG case]Fix pairwise disjoint subsets $X,Y,Z,W\subset V$ and a value $x\in E_X$. The repaired statement assumes, rather than proves inside this closure theorem, that the three displayed do-calculus schemata are semantically valid in the model $M$ under the fixed regular-conditional-kernel convention. Therefore, whenever the d-separation premise of Rule One holds in $G_{\overline X}$, the equality
\begin{align*}
P_x(Y\in\cdot\mid Z,W)=P_x(Y\in\cdot\mid W)
\end{align*}
is an allowed semantic equality for $P_x$-almost every value of $(Z,W)$. Whenever the d-separation premise of Rule Two holds in $G_{\overline X,\underline Z}$, the equality
\begin{align*}
P_{x,z}(Y\in\cdot\mid W)=P_x(Y\in\cdot\mid Z=z,W)
\end{align*}
is an allowed semantic equality for $P_x$-almost every value of $(z,W)$. Whenever the d-separation premise of Rule Three holds in $G_{\overline X,\overline{Z(W)}}$, where $Z(W):=Z\setminus\operatorname{An}_{G_{\overline X}}(W)$, the equality
\begin{align*}
P_{x,z}(Y\in\cdot\mid W)=P_x(Y\in\cdot\mid W)
\end{align*}
is an allowed semantic equality for $P_x$-almost every value of $(z,W)$.
Thus every DAG rule instance that may appear in a formal derivation is already a semantic equality of regular conditional kernels in the precise almost-everywhere sense fixed in the preceding step.[/step]
custom_env
admin
[guided]This step is deliberately modest. The theorem is not proving the individual graphical do-calculus rules from the structural equations; it is proving that formal derivations built from semantically valid rule instances are sound. Accordingly, the statement assumes that each of the three displayed rule schemata is semantically valid in $M$ under the same regular-conditional-kernel convention fixed above.
Let $X,Y,Z,W\subset V$ be pairwise disjoint and let $x\in E_X$. If the Rule One graphical premise holds, namely if $Y$ and $Z$ are d-separated by $X\cup W$ in $G_{\overline X}$, the semantic-validity hypothesis gives the kernel equality
\begin{align*}
P_x(Y\in\cdot\mid Z,W)=P_x(Y\in\cdot\mid W)
\end{align*}
outside a $P_x$-null set of conditioning values of $(Z,W)$. This is exactly an equality in the equivalence relation defined in the first step.
If the Rule Two graphical premise holds, namely if $Y$ and $Z$ are d-separated by $X\cup W$ in $G_{\overline X,\underline Z}$, the same semantic-validity hypothesis gives
\begin{align*}
P_{x,z}(Y\in\cdot\mid W)=P_x(Y\in\cdot\mid Z=z,W)
\end{align*}
for $P_x$-almost every value of $(z,W)$. Again, the null law is the one specified by the rule, so the equality fits the convention used in the derivation.
For Rule Three, define the subset of action variables
\begin{align*}
Z(W):=Z\setminus\operatorname{An}_{G_{\overline X}}(W).
\end{align*}
If $Y$ and $Z$ are d-separated by $X\cup W$ in $G_{\overline X,\overline{Z(W)}}$, the assumed semantic validity of Rule Three gives
\begin{align*}
P_{x,z}(Y\in\cdot\mid W)=P_x(Y\in\cdot\mid W)
\end{align*}
for $P_x$-almost every value of $(z,W)$. Therefore every rule application allowed in the DAG derivation is a genuine semantic equality before the induction begins.[/guided]
custom_env
admin
[step:Prove soundness for finite DAG derivations by induction]
Let
\begin{align*}
\mathcal D=(E_1,\dots,E_n)
\end{align*}
be a formal do-calculus derivation, where each $E_i$ is an equality between observational or interventional regular conditional kernels. We prove by induction on $i\in\{1,\dots,n\}$ that $E_i$ is semantically valid in $M$ under the stated almost-everywhere convention.
For $i=1$, the equality $E_1$ is either an instance of one of the three schemata or a semantically valid probability identity. In the first case, semantic validity follows from the preceding step. In the second case, semantic validity is part of the definition of the allowed probability identities.
Assume that $1\le i<n$ and that $E_1,\dots,E_i$ are semantically valid. The equality $E_{i+1}$ is again either an instance of one of the three schemata or an allowed probability identity applied to previously established kernel equalities. In the first case, the preceding step gives semantic validity. In the second case, the identity is valid for regular conditional kernels modulo the same almost-everywhere convention, so applying it to semantically valid premises produces a semantically valid conclusion, with the exceptional conditioning set enlarged by a finite union of null sets. A finite union of null sets is null for the relevant conditioning law. Hence $E_{i+1}$ is semantically valid.
By induction, every equality in $\mathcal D$ is semantically valid. In particular, the final equality obtained by the derivation is semantically valid in $M$.
[/step]
custom_env
admin
[step:Record the semantic rule instances available in the mixed-graph case]Now let $H$ be a finite DAG on $V\cup L$, with $L$ latent, and let $M$ be a Markovian latent-variable structural causal model compatible with $H$. Let $G^*$ be the latent projection of $H$ onto $V$. For $A\subset V$, let
\begin{align*}
\rho_A:E_{V\cup L}\to E_A
\end{align*}
denote the coordinate projection from the full endogenous state space to the observed coordinates in $A$.
The repaired statement assumes that, under the mutilation-before-projection convention, every mixed-graph instance of the three do-calculus schemata is semantically valid for the observed interventional kernels. This hypothesis is exactly where the nontrivial facts about latent projection, the equivalence between the relevant m-separation and full-DAG separation statements, and the compatibility of observed regular conditional kernels with marginalization are used. Consequently, every mixed-graph rule instance that may appear in a formal derivation is already an equality of observed regular conditional kernels modulo the specified almost-everywhere convention.[/step]
custom_env
admin
[guided]The latent-variable part uses the same closure argument as the fully observed part, but it needs a separate semantic-validity hypothesis because latent projection is not a formal probability identity. Let
\begin{align*}
\rho_A:E_{V\cup L}\to E_A
\end{align*}
be the coordinate projection to the observed coordinates indexed by $A\subset V$. Observed interventional laws are obtained by applying the intervention in the full model on $V\cup L$ and then pushing the resulting law forward to $E_V$.
The statement assumes more than the existence of the latent projection $G^*$. It assumes that, whenever a mixed-graph rule is applied, the graph surgery is interpreted by first mutilating the full DAG $H$ and then projecting to $V$, and that each resulting mixed-graph rule instance is semantically valid for the observed regular conditional kernels. This assumption absorbs the nontrivial theorem relating m-separation in the latent projection to separation in the corresponding mutilated full DAG. It also absorbs the version-sensitive issue that observed regular conditional kernels must be chosen compatibly with the pushforward observed laws.
Therefore no additional latent-[projection theorem](/theorems/1985) or marginalization theorem is being smuggled into this proof. Once the mixed-graph rule instance has been declared semantically valid under the statement's convention, it enters the formal derivation in exactly the same way as a DAG rule instance: as an equality of regular conditional kernels outside a null set for the conditioning law specified by the rule.[/guided]
custom_env
admin
[step:Conclude soundness for mixed-graph formal derivations]
Let
\begin{align*}
\mathcal D^*=(F_1,\dots,F_m)
\end{align*}
be a formal derivation using the mixed-graph schemata and semantically valid probability identities. By the preceding step and the mixed-graph semantic-validity hypothesis in the statement, each rule instance in $\mathcal D^*$ is semantically valid for the observed interventional kernels of the latent-variable model. By hypothesis, each allowed probability identity is semantically valid for regular conditional kernels modulo the same almost-everywhere convention.
Repeating the induction from the fully observed case, with observed kernels in place of full kernels, shows that $F_1,\dots,F_m$ are all semantically valid. Therefore the final equality obtained by the mixed-graph derivation is semantically valid in every Markovian latent-variable structural causal model compatible with $H$, under the stated kernel-equality convention. This proves both the fully observed and latent-variable assertions of the theorem.
[/step]