Soundness of D-Separation for Directed Acyclic Graphs

Soundness of D-Separation for Directed Acyclic Graphs (Theorem # 9669)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We reduce the assertion to the ancestral graph on the variables relevant to the query, because variables outside the ancestors of $A\cup B\cup Z$ cannot affect the conditional law of $X_A$ and $X_B$ given $X_Z$. The DAG factorization on this ancestral set can be regrouped into a clique factorization over the moralized ancestral graph. By the [moralized ancestral graph criterion](/theorems/9670), d-separation in the original DAG becomes ordinary graph separation in this undirected moral graph. Finally, the clique factorization and undirected separation imply conditional independence by splitting the density into one factor involving $A$ and $Z$ and another involving $B$ and $Z$. [/proofplan] [step:Restrict the factorization to the ancestors of the queried variables] Let $S\subset V$ denote the ancestral set of $A\cup B\cup Z$ in $G$, meaning that $S$ consists of all vertices $v\in V$ from which there is a directed path to at least one vertex in $A\cup B\cup Z$, together with $A\cup B\cup Z$ itself. Let $G_S$ be the induced subgraph of $G$ on $S$. Since $S$ is ancestral, every parent in $G$ of a vertex in $S$ also belongs to $S$. Therefore, for every $v\in S$, \begin{align*} \operatorname{pa}_{G_S}(v)=\operatorname{pa}_G(v). \end{align*} For every $T\subset V$, define the product state space $\mathcal X_T:=\prod_{u\in T}\mathcal X_u$ and the product dominating measure $\mu_T:=\bigotimes_{u\in T}\mu_u$. Let $x_S=(x_v)_{v\in S}\in\mathcal X_S$, and let $p_S:\mathcal X_S\to[0,\infty)$ denote the marginal mass function or density of $X_S$ with respect to $\mu_S$. Since every factor $p_v$ is nonnegative, Tonelli's theorem justifies computing the marginal density by iterated integration or summation over $\mathcal X_{V\setminus S}$ in any fixed coordinate order. Integrating or summing the full factorization over the variables in $V\setminus S$ gives \begin{align*} p_S(x_S)=\prod_{v\in S}p_v(x_v\mid x_{\operatorname{pa}_G(v)}) \end{align*} for $\mu_S$-almost every $x_S\in\mathcal X_S$; after modifying $p_S$ on a $\mu_S$-null set, we use this displayed product as the chosen version of the marginal density. Indeed, because $G$ is acyclic, choose a topological ordering of $V$ in which every parent precedes its child. In the iterated integration over $V\setminus S$, take the variables outside $S$ in reverse topological order. If $w\in V\setminus S$, then no child of $w$ that remains unintegrated can lie in $S$; otherwise $w$ would be an ancestor of a vertex in $S$, hence an ancestor of $A\cup B\cup Z$, contradicting $w\notin S$. Thus the only remaining factor containing $x_w$ at the moment $x_w$ is eliminated is its conditional factor $p_w(x_w\mid x_{\operatorname{pa}_G(w)})$, whose integral or sum over $x_w$ with respect to $\mu_w$ equals $1$. Repeating this elimination leaves exactly the displayed product over $S$. [guided] The reason for passing to ancestors is that conditioning on $Z$ can activate collider paths only through ancestors of $Z$. Therefore the correct reduced graph is not merely the subgraph on $A\cup B\cup Z$, but the ancestral subgraph on all variables that can flow into these sets. Define $S\subset V$ to be the ancestral set of $A\cup B\cup Z$: a vertex $v\in V$ lies in $S$ precisely when $v\in A\cup B\cup Z$ or there is a directed path from $v$ to some vertex in $A\cup B\cup Z$. Let $G_S$ be the induced directed graph on $S$. Since $S$ is ancestral, if $v\in S$ and $u\to v$ is an edge of $G$, then $u\in S$. Hence the parent set of $v$ does not change after passing to the induced ancestral graph: \begin{align*} \operatorname{pa}_{G_S}(v)=\operatorname{pa}_G(v). \end{align*} For every $T\subset V$, define $\mathcal X_T:=\prod_{u\in T}\mathcal X_u$ and $\mu_T:=\bigotimes_{u\in T}\mu_u$. Let $p_S:\mathcal X_S\to[0,\infty)$ be the marginal mass function or density of $X_S$ with respect to $\mu_S$. We claim that $p_S$ still factorizes according to the induced ancestral DAG: \begin{align*} p_S(x_S)=\prod_{v\in S}p_v(x_v\mid x_{\operatorname{pa}_G(v)}). \end{align*} To justify this, fix a topological order of the finite DAG $G$. Such an order exists because $G$ is finite and acyclic. Starting from the full factorization \begin{align*} p(x_V)=\prod_{v\in V}p_v(x_v\mid x_{\operatorname{pa}_G(v)}), \end{align*} we integrate or sum over all coordinates in $V\setminus S$ in reverse topological order. This iterated elimination is justified by Tonelli's theorem, because the integrand is the nonnegative product of conditional mass functions or densities. When eliminating a vertex $w\in V\setminus S$, every child of $w$ that remains uneliminated is also outside $S$; if a child in $S$ remained, then $w$ would be an ancestor of a vertex in $S$, and hence an ancestor of $A\cup B\cup Z$, so $w$ would have to belong to $S$. Thus, at the time $x_w$ is eliminated, the only remaining factor that depends on $x_w$ is \begin{align*} p_w(x_w\mid x_{\operatorname{pa}_G(w)}). \end{align*} Its integral or sum over $x_w$ with respect to $\mu_w$ is $1$ for the relevant conditioning values because it is a conditional mass function or density. Eliminating every vertex outside $S$ therefore leaves precisely the product of conditional factors indexed by $S$, giving the displayed factorization as the chosen $\mu_S$-almost-everywhere version of the marginal density. [/guided] [/step] [step:Moralize the ancestral graph and regroup the density over its cliques] Let $H$ be the moralized ancestral graph obtained from $G_S$ by replacing every directed edge by an undirected edge and by connecting every pair of distinct parents of a common child in $S$. For each $v\in S$, define \begin{align*} C_v:=\{v\}\cup \operatorname{pa}_{G_S}(v). \end{align*} The set $C_v$ is a clique in $H$: the vertex $v$ is adjacent in $H$ to each of its parents, and any two distinct parents of $v$ are adjacent by moralization. For each $v\in S$, define the nonnegative function $\psi_v:\mathcal X_{C_v}\to [0,\infty)$ by \begin{align*} \psi_v(x_{C_v})=p_v(x_v\mid x_{\operatorname{pa}_{G_S}(v)}). \end{align*} Then the marginal density or mass function on $S$ has the clique factorization \begin{align*} p_S(x_S)=\prod_{v\in S}\psi_v(x_{C_v}). \end{align*} Thus $p_S$ factorizes over cliques of the undirected graph $H$. [/step] [step:Translate d-separation into separation in the moralized ancestral graph] The hypotheses of the [Moralized Ancestral Graph Criterion][citetheorem:9670] are satisfied here: $G$ is a finite directed acyclic graph, $A,B,Z\subset V$ are pairwise disjoint, $S$ is exactly the ancestral set of $A\cup B\cup Z$, and $H$ is exactly the graph obtained by moralizing the induced ancestral DAG $G_S$. That criterion states, for these data, that $A$ is d-separated from $B$ by $Z$ in $G$ if and only if $Z$ separates $A$ from $B$ in $H$, meaning that every undirected path in $H$ from a vertex of $A$ to a vertex of $B$ meets $Z$. Since the hypothesis gives $A\perp_G B\mid Z$, the forward implication of the criterion yields that every undirected path in $H$ from a vertex of $A$ to a vertex of $B$ meets $Z$. Let $S_A\subset S\setminus Z$ be the set of vertices connected in $H\setminus Z$ to at least one vertex of $A$, and let \begin{align*} S_B:=S\setminus (S_A\cup Z). \end{align*} The separation just obtained implies $B\subset S_B$. Moreover, there is no edge of $H$ joining a vertex of $S_A$ to a vertex of $S_B$, because such an edge would be a path in $H\setminus Z$ from $A$ to that vertex of $S_B$. [/step] [step:Split the clique factorization across the separating set] Every clique $C$ of $H$ is contained in either $S_A\cup Z$ or $S_B\cup Z$. Indeed, if a clique contained one vertex in $S_A$ and one vertex in $S_B$, those two vertices would be adjacent in $H$, contradicting the absence of edges between $S_A$ and $S_B$. Define nonnegative functions \begin{align*} \alpha:\prod_{u\in S_A\cup Z}\mathcal X_u\to [0,\infty) \end{align*} and \begin{align*} \beta:\prod_{u\in S_B\cup Z}\mathcal X_u\to [0,\infty) \end{align*} by assigning each factor $\psi_v$ to $\alpha$ when $C_v\subset S_A\cup Z$ and to $\beta$ otherwise. Since every $C_v$ is contained in one of the two sets, and any factor contained in both may be assigned to either side, the clique factorization becomes \begin{align*} p_S(x_S)=\alpha(x_{S_A},x_Z)\beta(x_{S_B},x_Z). \end{align*} We now compute the conditional density or mass function of $(X_{S_A},X_{S_B})$ given $X_Z=x_Z$ at any value $x_Z\in\mathcal X_Z$ for which the normalizing denominator is positive. Define the normalization maps $M_A:\mathcal X_Z\to[0,\infty]$ and $M_B:\mathcal X_Z\to[0,\infty]$ by \begin{align*} M_A(x_Z):=\int_{\mathcal X_{S_A}} \alpha(x_{S_A},x_Z)\,d\mu_{S_A}(x_{S_A}) \end{align*} and \begin{align*} M_B(x_Z):=\int_{\mathcal X_{S_B}} \beta(x_{S_B},x_Z)\,d\mu_{S_B}(x_{S_B}). \end{align*} Here $\mu_{S_A}$ and $\mu_{S_B}$ are the product dominating measures defined above, with sums replacing integrals in the discrete case. Let $p_Z:\mathcal X_Z\to[0,\infty)$ denote the marginal density or mass function of $X_Z$ with respect to $\mu_Z$. By Tonelli's theorem applied to the nonnegative function $p_S$, integration first over $\mathcal X_{S_A}$ and $\mathcal X_{S_B}$ gives \begin{align*} p_Z(x_Z)=M_A(x_Z)M_B(x_Z) \end{align*} for $\mu_Z$-almost every $x_Z\in\mathcal X_Z$. Let $\mathbb P_{X_Z}$ denote the law of $X_Z$, that is, the pushforward measure $\mathbb P\circ X_Z^{-1}$ on $(\mathcal X_Z,\mathcal E_Z)$. Let $N\subset\mathcal X_Z$ be the measurable set on which this identity fails or on which $p_Z(x_Z)=0$; then $\mathbb P_{X_Z}(N)=\mathbb P(X_Z\in N)=0$. For $x_Z\in\mathcal X_Z\setminus N$, define versions of the conditional density or mass functions by \begin{align*} p_{S_A,S_B\mid Z}(x_{S_A},x_{S_B}\mid x_Z):=\frac{\alpha(x_{S_A},x_Z)\beta(x_{S_B},x_Z)}{M_A(x_Z)M_B(x_Z)} \end{align*} for $(x_{S_A},x_{S_B})\in\mathcal X_{S_A}\times\mathcal X_{S_B}$, \begin{align*} p_{S_A\mid Z}(x_{S_A}\mid x_Z):=\frac{\alpha(x_{S_A},x_Z)}{M_A(x_Z)} \end{align*} for $x_{S_A}\in\mathcal X_{S_A}$, and \begin{align*} p_{S_B\mid Z}(x_{S_B}\mid x_Z):=\frac{\beta(x_{S_B},x_Z)}{M_B(x_Z)} \end{align*} for $x_{S_B}\in\mathcal X_{S_B}$. If, for a $\mu_Z$-null exceptional value, one of the displayed denominators is not positive and finite, define the corresponding conditional densities arbitrarily as any probability density or mass function on the relevant coordinate space. This arbitrary choice does not affect conditional independence, because conditional independence given $X_Z$ is an almost-sure statement with respect to the law of $X_Z$. For every $x_Z\in\mathcal X_Z\setminus N$, the displayed definitions give \begin{align*} p_{S_A,S_B\mid Z}(x_{S_A},x_{S_B}\mid x_Z)=p_{S_A\mid Z}(x_{S_A}\mid x_Z)p_{S_B\mid Z}(x_{S_B}\mid x_Z). \end{align*} Hence the factorization of conditional densities holds for $\mathbb P_{X_Z}$-almost every conditioning value, which is precisely \begin{align*} X_{S_A}\perp\!\!\!\perp X_{S_B}\mid X_Z. \end{align*} [/step] [step:Pass from the separated blocks to the requested subfamilies] Since $A\subset S_A$ and $B\subset S_B$, conditional independence of the larger random subfamilies implies conditional independence of their subfamilies. Applying measurable coordinate projections from $X_{S_A}$ to $X_A$ and from $X_{S_B}$ to $X_B$ preserves conditional independence given $X_Z$. Hence \begin{align*} X_A\perp\!\!\!\perp X_B\mid X_Z. \end{align*} This is the desired soundness of d-separation under the DAG Markov factorization. [/step]

Explore Further

What brings you to Androma?

Start with a route through the knowledge graph.