[proofplan]
The proof is the formal content of deleting an observation that is conditionally irrelevant after intervention. Under $do(X=x)$, the interventional law $P_x$ is Markov with respect to the mutilated graph $G_{\overline X}$, so the assumed d-separation gives conditional independence of $Y$ and $Z$ given $X$ and $W$. Since the intervention fixes $X=x$ with $P_x$-probability one, conditioning on $X=x$ adds no further restriction under $P_x$. The positivity assumptions ensure that every [conditional probability](/page/Conditional%20Probability) appearing in the algebra is defined.
[/proofplan]
[step:Convert d-separation in the mutilated graph into conditional independence under $P_x$]
For an assignment $a$ to a set of variables $A \subset V$, write $\{A=a\}$ for the event that every variable in $A$ takes its assigned value. Fix assignments $x,y,z,w$ satisfying
\begin{align*}
P_x(Z=z,W=w)>0
\end{align*}
and
\begin{align*}
P_x(W=w)>0.
\end{align*}
By hypothesis, $P_x$ is Markov with respect to $G_{\overline X}$. Since $Y$ and $Z$ are d-separated by $X \cup W$ in $G_{\overline X}$, the soundness of d-separation for Markov laws, applied to the graph $G_{\overline X}$, the probability law $P_x$, and the three pairwise disjoint vertex sets $Y$, $Z$, and $X\cup W$, gives
\begin{align*}
Y \perp\!\!\!\perp Z \mid X,W
\end{align*}
under $P_x$.
Thus, whenever the conditioning events have positive $P_x$-probability,
\begin{align*}
P_x(Y=y \mid X=x,Z=z,W=w)=P_x(Y=y \mid X=x,W=w).
\end{align*}
[/step]
[step:Remove the intervened variables from the conditioning events]
Because $P_x(X=x)=1$, intersecting any event with $\{X=x\}$ does not change its $P_x$-probability. In particular,
\begin{align*}
P_x(X=x,Z=z,W=w)=P_x(Z=z,W=w)>0
\end{align*}
and
\begin{align*}
P_x(X=x,W=w)=P_x(W=w)>0.
\end{align*}
Therefore the conditional probabilities in the previous step are defined.
Moreover,
\begin{align*}
P_x(Y=y \mid X=x,Z=z,W=w)=\frac{P_x(Y=y,X=x,Z=z,W=w)}{P_x(X=x,Z=z,W=w)}.
\end{align*}
Since $P_x(X=x)=1$, the event $\{X=x\}$ has $P_x$-null complement, so
\begin{align*}
P_x(Y=y,X=x,Z=z,W=w)=P_x(Y=y,Z=z,W=w)
\end{align*}
and
\begin{align*}
P_x(X=x,Z=z,W=w)=P_x(Z=z,W=w).
\end{align*}
Hence
\begin{align*}
P_x(Y=y \mid X=x,Z=z,W=w)=P_x(Y=y \mid Z=z,W=w).
\end{align*}
For the conditioning event involving only $W=w$, the definition of conditional probability gives
\begin{align*}
P_x(Y=y \mid X=x,W=w)=\frac{P_x(Y=y,X=x,W=w)}{P_x(X=x,W=w)}.
\end{align*}
Since $P_x(X=x)=1$, the numerator and denominator reduce to
\begin{align*}
P_x(Y=y,X=x,W=w)=P_x(Y=y,W=w)
\end{align*}
and
\begin{align*}
P_x(X=x,W=w)=P_x(W=w).
\end{align*}
Therefore
\begin{align*}
P_x(Y=y \mid X=x,W=w)=P_x(Y=y \mid W=w).
\end{align*}
[guided]
The point of this step is to justify carefully why the graphical conditional independence, which is stated with $X$ in the conditioning set, implies the theorem's displayed equality without $X$ in the conditioning set. Under the intervention $do(X=x)$, the interventional law $P_x$ is concentrated on the event $\{X=x\}$:
\begin{align*}
P_x(X=x)=1.
\end{align*}
Consequently, for any event $E$ determined by the remaining variables, intersecting with $\{X=x\}$ does not change its probability:
\begin{align*}
P_x(E \cap \{X=x\})=P_x(E).
\end{align*}
Indeed, the difference between $E$ and $E \cap \{X=x\}$ is contained in $\{X\ne x\}$, which has $P_x$-probability zero.
Apply this observation first to $E=\{Z=z,W=w\}$. The positivity hypothesis gives
\begin{align*}
P_x(Z=z,W=w)>0,
\end{align*}
and therefore
\begin{align*}
P_x(X=x,Z=z,W=w)=P_x(Z=z,W=w)>0.
\end{align*}
Thus $P_x(Y=y \mid X=x,Z=z,W=w)$ is defined. Its definition is
\begin{align*}
P_x(Y=y \mid X=x,Z=z,W=w)=\frac{P_x(Y=y,X=x,Z=z,W=w)}{P_x(X=x,Z=z,W=w)}.
\end{align*}
Using again that $P_x(X=x)=1$, the numerator and denominator reduce to
\begin{align*}
P_x(Y=y,X=x,Z=z,W=w)=P_x(Y=y,Z=z,W=w)
\end{align*}
and
\begin{align*}
P_x(X=x,Z=z,W=w)=P_x(Z=z,W=w).
\end{align*}
Therefore
\begin{align*}
P_x(Y=y \mid X=x,Z=z,W=w)=P_x(Y=y \mid Z=z,W=w).
\end{align*}
Now apply the same null-complement observation to the event $\{W=w\}$. Since
\begin{align*}
P_x(W=w)>0,
\end{align*}
we also have
\begin{align*}
P_x(X=x,W=w)=P_x(W=w)>0.
\end{align*}
Hence
\begin{align*}
P_x(Y=y \mid X=x,W=w)=\frac{P_x(Y=y,X=x,W=w)}{P_x(X=x,W=w)}.
\end{align*}
The intervention degeneracy gives
\begin{align*}
P_x(Y=y,X=x,W=w)=P_x(Y=y,W=w)
\end{align*}
and
\begin{align*}
P_x(X=x,W=w)=P_x(W=w),
\end{align*}
so
\begin{align*}
P_x(Y=y \mid X=x,W=w)=P_x(Y=y \mid W=w).
\end{align*}
This is the precise algebraic reason that conditioning on the intervened value $X=x$ can be removed under the law $P_x$.
[/guided]
[/step]
[step:Combine conditional independence with intervention degeneracy]
From conditional independence under $P_x$,
\begin{align*}
P_x(Y=y \mid X=x,Z=z,W=w)=P_x(Y=y \mid X=x,W=w).
\end{align*}
The identities from the previous step replace the left-hand side by $P_x(Y=y \mid Z=z,W=w)$ and the right-hand side by $P_x(Y=y \mid W=w)$. Therefore
\begin{align*}
P_x(Y=y \mid Z=z,W=w)=P_x(Y=y \mid W=w).
\end{align*}
This is Pearl's Rule One for the assignments $x,y,z,w$, and the proof is complete.
[/step]