Pearl's Rule One — Statement & Proof

Pearl's Rule One (Theorem # 9678)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] The proof is the formal content of deleting an observation that is conditionally irrelevant after intervention. Under $do(X=x)$, the interventional law $P_x$ is Markov with respect to the mutilated graph $G_{\overline X}$, so the assumed d-separation gives conditional independence of $Y$ and $Z$ given $X$ and $W$. Since the intervention fixes $X=x$ with $P_x$-probability one, conditioning on $X=x$ adds no further restriction under $P_x$. The positivity assumptions ensure that every [conditional probability](/page/Conditional%20Probability) appearing in the algebra is defined. [/proofplan] [step:Convert d-separation in the mutilated graph into conditional independence under $P_x$] For an assignment $a$ to a set of variables $A \subset V$, write $\{A=a\}$ for the event that every variable in $A$ takes its assigned value. Fix assignments $x,y,z,w$ satisfying \begin{align*} P_x(Z=z,W=w)>0 \end{align*} and \begin{align*} P_x(W=w)>0. \end{align*} By hypothesis, $P_x$ is Markov with respect to $G_{\overline X}$. Since $Y$ and $Z$ are d-separated by $X \cup W$ in $G_{\overline X}$, the soundness of d-separation for Markov laws, applied to the graph $G_{\overline X}$, the probability law $P_x$, and the three pairwise disjoint vertex sets $Y$, $Z$, and $X\cup W$, gives \begin{align*} Y \perp\!\!\!\perp Z \mid X,W \end{align*} under $P_x$. Thus, whenever the conditioning events have positive $P_x$-probability, \begin{align*} P_x(Y=y \mid X=x,Z=z,W=w)=P_x(Y=y \mid X=x,W=w). \end{align*} [/step] [step:Remove the intervened variables from the conditioning events] Because $P_x(X=x)=1$, intersecting any event with $\{X=x\}$ does not change its $P_x$-probability. In particular, \begin{align*} P_x(X=x,Z=z,W=w)=P_x(Z=z,W=w)>0 \end{align*} and \begin{align*} P_x(X=x,W=w)=P_x(W=w)>0. \end{align*} Therefore the conditional probabilities in the previous step are defined. Moreover, \begin{align*} P_x(Y=y \mid X=x,Z=z,W=w)=\frac{P_x(Y=y,X=x,Z=z,W=w)}{P_x(X=x,Z=z,W=w)}. \end{align*} Since $P_x(X=x)=1$, the event $\{X=x\}$ has $P_x$-null complement, so \begin{align*} P_x(Y=y,X=x,Z=z,W=w)=P_x(Y=y,Z=z,W=w) \end{align*} and \begin{align*} P_x(X=x,Z=z,W=w)=P_x(Z=z,W=w). \end{align*} Hence \begin{align*} P_x(Y=y \mid X=x,Z=z,W=w)=P_x(Y=y \mid Z=z,W=w). \end{align*} For the conditioning event involving only $W=w$, the definition of conditional probability gives \begin{align*} P_x(Y=y \mid X=x,W=w)=\frac{P_x(Y=y,X=x,W=w)}{P_x(X=x,W=w)}. \end{align*} Since $P_x(X=x)=1$, the numerator and denominator reduce to \begin{align*} P_x(Y=y,X=x,W=w)=P_x(Y=y,W=w) \end{align*} and \begin{align*} P_x(X=x,W=w)=P_x(W=w). \end{align*} Therefore \begin{align*} P_x(Y=y \mid X=x,W=w)=P_x(Y=y \mid W=w). \end{align*} [guided] The point of this step is to justify carefully why the graphical conditional independence, which is stated with $X$ in the conditioning set, implies the theorem's displayed equality without $X$ in the conditioning set. Under the intervention $do(X=x)$, the interventional law $P_x$ is concentrated on the event $\{X=x\}$: \begin{align*} P_x(X=x)=1. \end{align*} Consequently, for any event $E$ determined by the remaining variables, intersecting with $\{X=x\}$ does not change its probability: \begin{align*} P_x(E \cap \{X=x\})=P_x(E). \end{align*} Indeed, the difference between $E$ and $E \cap \{X=x\}$ is contained in $\{X\ne x\}$, which has $P_x$-probability zero. Apply this observation first to $E=\{Z=z,W=w\}$. The positivity hypothesis gives \begin{align*} P_x(Z=z,W=w)>0, \end{align*} and therefore \begin{align*} P_x(X=x,Z=z,W=w)=P_x(Z=z,W=w)>0. \end{align*} Thus $P_x(Y=y \mid X=x,Z=z,W=w)$ is defined. Its definition is \begin{align*} P_x(Y=y \mid X=x,Z=z,W=w)=\frac{P_x(Y=y,X=x,Z=z,W=w)}{P_x(X=x,Z=z,W=w)}. \end{align*} Using again that $P_x(X=x)=1$, the numerator and denominator reduce to \begin{align*} P_x(Y=y,X=x,Z=z,W=w)=P_x(Y=y,Z=z,W=w) \end{align*} and \begin{align*} P_x(X=x,Z=z,W=w)=P_x(Z=z,W=w). \end{align*} Therefore \begin{align*} P_x(Y=y \mid X=x,Z=z,W=w)=P_x(Y=y \mid Z=z,W=w). \end{align*} Now apply the same null-complement observation to the event $\{W=w\}$. Since \begin{align*} P_x(W=w)>0, \end{align*} we also have \begin{align*} P_x(X=x,W=w)=P_x(W=w)>0. \end{align*} Hence \begin{align*} P_x(Y=y \mid X=x,W=w)=\frac{P_x(Y=y,X=x,W=w)}{P_x(X=x,W=w)}. \end{align*} The intervention degeneracy gives \begin{align*} P_x(Y=y,X=x,W=w)=P_x(Y=y,W=w) \end{align*} and \begin{align*} P_x(X=x,W=w)=P_x(W=w), \end{align*} so \begin{align*} P_x(Y=y \mid X=x,W=w)=P_x(Y=y \mid W=w). \end{align*} This is the precise algebraic reason that conditioning on the intervened value $X=x$ can be removed under the law $P_x$. [/guided] [/step] [step:Combine conditional independence with intervention degeneracy] From conditional independence under $P_x$, \begin{align*} P_x(Y=y \mid X=x,Z=z,W=w)=P_x(Y=y \mid X=x,W=w). \end{align*} The identities from the previous step replace the left-hand side by $P_x(Y=y \mid Z=z,W=w)$ and the right-hand side by $P_x(Y=y \mid W=w)$. Therefore \begin{align*} P_x(Y=y \mid Z=z,W=w)=P_x(Y=y \mid W=w). \end{align*} This is Pearl's Rule One for the assignments $x,y,z,w$, and the proof is complete. [/step]

Explore Further

What brings you to Androma?

Start with a route through the knowledge graph.

Pearl's Rule One (Theorem # 9678)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Pearl's Rule One (Theorem # 9678)

Discussion

Proof

Explore Further