Global Markov Property for Directed Acyclic Graphs

Global Markov Property for Directed Acyclic Graphs (Theorem # 9667)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We first restrict the factorization to the ancestral set of $A\cup B\cup C$ by integrating out non-ancestors in reverse topological order. The standard [moralized ancestral graph criterion](/theorems/9670) converts the assumed d-separation into ordinary separation in an undirected graph, which partitions the remaining unconditioned vertices into an $A$-side and a $B$-side. The ancestral density then factors into one non-negative function of the $A$-side and $C$, and another non-negative function of the $B$-side and $C$. Normalizing this product after fixing $X_C$ gives a product conditional density, and marginalizing the two sides yields $X_A\perp\!\!\!\perp X_B\mid X_C$. [/proofplan] [step:Restrict the product density to the ancestors of $A\cup B\cup C$] Let $S=\operatorname{An}_G(A\cup B\cup C)$. Because $G$ is finite and acyclic, it admits a topological ordering: repeatedly choose a vertex with no incoming edge in the remaining finite acyclic graph, delete it, and continue until all vertices have been chosen. Use the reverse of such an ordering on $V\setminus S$. The set $S$ is ancestral: if $j\in S$ and $i\in\operatorname{pa}(j)$, then a directed path from $j$ to $A\cup B\cup C$, preceded by the edge $i\to j$, shows $i\in S$. Hence every factor indexed by $j\in S$ depends only on coordinates in $S$. Let $p_S:E_S\to[0,\infty]$ denote the marginal density of $X_S$ with respect to $\mu_S$. Tonelli's theorem applies to the non-negative measurable product density $p$. If $j\notin S$, then no descendant of $j$ lies in $A\cup B\cup C$. Thus, after all non-ancestral descendants of $j$ have been integrated out, the only remaining factor involving $x_j$ is $p_j(x_j\mid x_{\operatorname{pa}(j)})$. The normalization hypothesis gives \begin{align*} \int_{E_j}p_j(x_j\mid x_{\operatorname{pa}(j)})\,d\mu_j(x_j)=1. \end{align*} Iterating over $V\setminus S$ in reverse topological order gives \begin{align*} p_S(x_S)=\prod_{j\in S}p_j(x_j\mid x_{\operatorname{pa}(j)}) \end{align*} for $\mu_S$-almost every $x_S\in E_S$. [guided] We want to remove variables that cannot affect the conditional distribution of $X_A$ and $X_B$ once $X_C$ is fixed. Define \begin{align*} S=\operatorname{An}_G(A\cup B\cup C). \end{align*} By definition, a vertex belongs to $S$ exactly when it has a directed path to some vertex in $A\cup B\cup C$, allowing a path of length zero. The first useful property of $S$ is that it is ancestral. If $j\in S$ and $i\in\operatorname{pa}(j)$, then there is a directed path from $j$ to $A\cup B\cup C$; adding the edge $i\to j$ at the beginning gives a directed path from $i$ to $A\cup B\cup C$, so $i\in S$. Therefore factors with indices in $S$ have all their parent variables inside $S$. We next justify the order of integration. A finite directed acyclic graph has a topological ordering. Indeed, every non-empty finite acyclic directed graph has a vertex with no incoming edge; otherwise, following incoming edges forever would eventually repeat a vertex and form a directed cycle. Choose such a vertex, delete it, and repeat. This produces an ordering in which every directed edge points forward. We integrate the vertices of $V\setminus S$ in the reverse of such an ordering. Let $p_S:E_S\to[0,\infty]$ be the marginal density of $X_S$ with respect to $\mu_S$. The joint density representative is the non-negative [measurable function](/page/Measurable%20Function) \begin{align*} x\mapsto \prod_{j=1}^{n}p_j(x_j\mid x_{\operatorname{pa}(j)}). \end{align*} Since the integrand is non-negative and each $\mu_j$ is $\sigma$-finite, Tonelli's theorem permits iterated integration over the coordinates in any chosen order. Consider a vertex $j\notin S$ at the moment it is integrated out. Because the order is reverse topological, all descendants of $j$ in $V\setminus S$ have already been integrated out. Also no descendant of $j$ can lie in $S$, for then $j$ would be an ancestor of $A\cup B\cup C$ and would belong to $S$. Hence every child factor depending on $x_j$ has already disappeared. The only remaining factor involving $x_j$ is the local kernel $p_j(x_j\mid x_{\operatorname{pa}(j)})$. The theorem assumes normalization for this exact kernel version, so \begin{align*} \int_{E_j}p_j(x_j\mid x_{\operatorname{pa}(j)})\,d\mu_j(x_j)=1 \end{align*} for every parent value. Therefore integrating the $j$-coordinate deletes exactly the factor indexed by $j$. Repeating this argument for all vertices in $V\setminus S$ leaves precisely \begin{align*} p_S(x_S)=\prod_{j\in S}p_j(x_j\mid x_{\operatorname{pa}(j)}) \end{align*} for $\mu_S$-almost every $x_S\in E_S$. [/guided] [/step] [step:Use the moralized ancestral graph criterion to split the unconditioned vertices] Let $G_S$ denote the directed subgraph of $G$ induced by $S$. Let $H$ be the undirected graph on $S$ obtained from $G_S$ by adding an undirected edge between each pair of distinct parents of a common child in $S$ and then replacing every directed edge by an undirected edge. This graph is the moralized ancestral graph for $A,B,C$. We use the standard moralized ancestral graph criterion in the following form: for a directed acyclic graph and pairwise disjoint vertex sets $A,B,C$, the set $C$ d-separates $A$ and $B$ in the directed graph if and only if $C$ separates $A$ and $B$ in the undirected moralized graph on $\operatorname{An}_G(A\cup B\cup C)$. Its hypotheses are exactly satisfied here: $G$ is a directed acyclic graph, $A,B,C$ are pairwise disjoint, and $S=\operatorname{An}_G(A\cup B\cup C)$. Hence every path in $H$ from $A$ to $B$ meets $C$. Let $U\subset S\setminus C$ be the union of all connected components of $H\setminus C$ that meet $A$, and define $W=(S\setminus C)\setminus U$. Then $A\subset U$ and $B\subset W$. There is no edge of $H$ joining $U$ to $W$, because such an edge would put its endpoints in the same [connected component](/page/Connected%20Component) of $H\setminus C$. Consequently, if $j\in U$, then every parent of $j$ lying in $S\setminus C$ belongs to $U$; if $j\in W$, then every parent of $j$ lying in $S\setminus C$ belongs to $W$. [guided] The directed path condition is hard to use directly in an integral factorization. The standard way to turn it into an algebraic separation is to pass to the moralized ancestral graph. Define $G_S$ to be the directed subgraph of $G$ induced by $S$. Define $H$ to be the undirected graph on $S$ obtained in two operations: first add an undirected edge between any two distinct parents of a common child in $S$, and then erase all arrowheads from the directed edges of $G_S$. We apply the standard moralized ancestral graph criterion. The exact form needed is this: for a directed acyclic graph and pairwise disjoint sets $A,B,C$, d-separation of $A$ and $B$ by $C$ in the directed graph is equivalent to ordinary undirected separation of $A$ and $B$ by $C$ in the moralized graph on $\operatorname{An}_G(A\cup B\cup C)$. The hypotheses are verified as follows. The graph $G$ is a directed acyclic graph by the theorem statement. The sets $A,B,C$ are pairwise disjoint by the theorem statement. The vertex set used to build $H$ is exactly $S=\operatorname{An}_G(A\cup B\cup C)$. Therefore the criterion applies. Since d-separation is assumed, every undirected path in $H$ from a vertex of $A$ to a vertex of $B$ meets $C$. Now remove $C$ from $H$. Let $U$ be the union of those connected components of $H\setminus C$ that meet $A$, and define \begin{align*} W=(S\setminus C)\setminus U. \end{align*} Then $A\subset U$ by construction. Also $B\subset W$: if a vertex of $B$ belonged to $U$, then $H\setminus C$ would contain a path from $A$ to $B$, contradicting separation by $C$. No edge of $H$ joins $U$ to $W$, because an edge in $H\setminus C$ between such vertices would merge their connected components. This graph statement controls parent variables. If $j\in U$ and $i\in\operatorname{pa}(j)\cap(S\setminus C)$, then the edge $i\to j$ in $G_S$ becomes an undirected edge between $i$ and $j$ in $H$. Since no edge of $H$ joins $U$ to $W$, the vertex $i$ cannot lie in $W$. Because $i\in S\setminus C$, it follows that $i\in U$. The same argument with $U$ and $W$ interchanged proves that parents in $S\setminus C$ of vertices in $W$ remain in $W$. [/guided] [/step] [step:Regroup the ancestral density into two factors sharing only $C$] Define [measurable functions](/page/Measurable%20Functions) $\Phi_U:E_U\times E_C\to[0,\infty]$ and $\Phi_W:E_W\times E_C\to[0,\infty]$ by \begin{align*} \Phi_U(x_U,x_C)=\prod_{j\in U}p_j(x_j\mid x_{\operatorname{pa}(j)}) \end{align*} and \begin{align*} \Phi_W(x_W,x_C)=\prod_{j\in W}p_j(x_j\mid x_{\operatorname{pa}(j)}). \end{align*} These functions have the stated domains because parents in $S\setminus C$ do not cross between $U$ and $W$. Define subsets of $C$ by \begin{align*} C_U=\{j\in C:\operatorname{pa}(j)\cap U\neq\varnothing\},\qquad C_W=\{j\in C:\operatorname{pa}(j)\cap W\neq\varnothing\} \end{align*} and $C_0=C\setminus(C_U\cup C_W)$. The sets $C_U$ and $C_W$ are disjoint. If $j\in C_U\cap C_W$, then some $u\in U$ and $w\in W$ are distinct parents of the common child $j\in S$, so moralization adds an edge between $u$ and $w$ in $H$, contradicting the absence of edges between $U$ and $W$. Define $\Phi_{C,U}:E_U\times E_C\to[0,\infty]$, $\Phi_{C,W}:E_W\times E_C\to[0,\infty]$, and $\Phi_{C,0}:E_C\to[0,\infty]$ by \begin{align*} \Phi_{C,U}(x_U,x_C)=\prod_{j\in C_U}p_j(x_j\mid x_{\operatorname{pa}(j)}),\qquad \Phi_{C,W}(x_W,x_C)=\prod_{j\in C_W}p_j(x_j\mid x_{\operatorname{pa}(j)}) \end{align*} and \begin{align*} \Phi_{C,0}(x_C)=\prod_{j\in C_0}p_j(x_j\mid x_{\operatorname{pa}(j)}). \end{align*} Empty products are interpreted as $1$. Set $r_U:E_U\times E_C\to[0,\infty]$ and $r_W:E_W\times E_C\to[0,\infty]$ by \begin{align*} r_U(x_U,x_C)=\Phi_U(x_U,x_C)\Phi_{C,U}(x_U,x_C)\Phi_{C,0}(x_C) \end{align*} and \begin{align*} r_W(x_W,x_C)=\Phi_W(x_W,x_C)\Phi_{C,W}(x_W,x_C). \end{align*} Combining the factors in the formula for $p_S$ gives \begin{align*} p_S(x_S)=r_U(x_U,x_C)r_W(x_W,x_C) \end{align*} for $\mu_S$-almost every $x_S\in E_S$. [/step] [step:Normalize the product factorization after conditioning on $X_C$] Let $p_C:E_C\to[0,\infty]$ be the marginal density of $X_C$ with respect to $\mu_C$. By Tonelli's theorem applied to the non-negative function $(x_U,x_W,x_C)\mapsto r_U(x_U,x_C)r_W(x_W,x_C)$, for $\mu_C$-almost every $x_C\in E_C$ we have \begin{align*} p_C(x_C)=R_U(x_C)R_W(x_C), \end{align*} where $R_U:E_C\to[0,\infty]$ and $R_W:E_C\to[0,\infty]$ are defined by \begin{align*} R_U(x_C)=\int_{E_U}r_U(x_U,x_C)\,d\mu_U(x_U) \end{align*} and \begin{align*} R_W(x_C)=\int_{E_W}r_W(x_W,x_C)\,d\mu_W(x_W). \end{align*} The set where $0<p_C(x_C)<\infty$ has full $\mathbb P_{X_C}$-measure. On this set, $p_C(x_C)=R_U(x_C)R_W(x_C)$ implies that both $R_U(x_C)$ and $R_W(x_C)$ are positive and finite. Define conditional density kernels $q_U:E_U\times E_C\to[0,\infty]$ and $q_W:E_W\times E_C\to[0,\infty]$ on this set by \begin{align*} q_U(x_U\mid x_C)=\frac{r_U(x_U,x_C)}{R_U(x_C)} \end{align*} and \begin{align*} q_W(x_W\mid x_C)=\frac{r_W(x_W,x_C)}{R_W(x_C)}. \end{align*} Each is normalized with respect to its corresponding product measure. Thus, for $\mathbb P_{X_C}$-almost every $x_C$, the conditional density of $(X_U,X_W)$ given $X_C=x_C$ induced by the joint density is \begin{align*} p_{U,W\mid C}(x_U,x_W\mid x_C)=q_U(x_U\mid x_C)q_W(x_W\mid x_C) \end{align*} for $\mu_U\otimes\mu_W$-almost every $(x_U,x_W)\in E_U\times E_W$. Hence $X_U\perp\!\!\!\perp X_W\mid X_C$. [guided] The previous step gave an algebraic product. Conditioning now turns that product into conditional independence. Let $p_C:E_C\to[0,\infty]$ be the marginal density of $X_C$ with respect to $\mu_C$. Since $p_S=r_Ur_W$ as a density representative, the marginal density is obtained by integrating out the $U$ and $W$ coordinates. Tonelli's theorem applies because the integrand is non-negative and measurable. Therefore, for $\mu_C$-almost every $x_C$, \begin{align*} p_C(x_C)=\int_{E_U\times E_W}r_U(x_U,x_C)r_W(x_W,x_C)\,d(\mu_U\otimes\mu_W)(x_U,x_W). \end{align*} Tonelli's theorem also separates the product integral into the product of the two one-sided integrals. Define \begin{align*} R_U(x_C)=\int_{E_U}r_U(x_U,x_C)\,d\mu_U(x_U) \end{align*} and \begin{align*} R_W(x_C)=\int_{E_W}r_W(x_W,x_C)\,d\mu_W(x_W). \end{align*} Then \begin{align*} p_C(x_C)=R_U(x_C)R_W(x_C) \end{align*} for $\mu_C$-almost every $x_C$. We only need conditional kernels for $\mathbb P_{X_C}$-almost every value of $x_C$. Since $\mathbb P_{X_C}$ has density $p_C$ with respect to $\mu_C$, the set where $p_C=0$ has $\mathbb P_{X_C}$-measure zero. The set where $p_C=\infty$ has $\mu_C$-measure zero because $p_C$ integrates to $1$ with respect to $\mu_C$. Hence $0<p_C(x_C)<\infty$ for $\mathbb P_{X_C}$-almost every $x_C$. For such $x_C$, the equality $p_C(x_C)=R_U(x_C)R_W(x_C)$ forces both $R_U(x_C)$ and $R_W(x_C)$ to be positive and finite. Define \begin{align*} q_U(x_U\mid x_C)=\frac{r_U(x_U,x_C)}{R_U(x_C)} \end{align*} and \begin{align*} q_W(x_W\mid x_C)=\frac{r_W(x_W,x_C)}{R_W(x_C)}. \end{align*} The normalization identities defining $R_U$ and $R_W$ show that $q_U(\cdot\mid x_C)$ is a probability density with respect to $\mu_U$ and $q_W(\cdot\mid x_C)$ is a probability density with respect to $\mu_W$. Dividing the joint density by the marginal density gives, for $\mu_U\otimes\mu_W$-almost every $(x_U,x_W)$, \begin{align*} p_{U,W\mid C}(x_U,x_W\mid x_C)=\frac{r_U(x_U,x_C)r_W(x_W,x_C)}{R_U(x_C)R_W(x_C)}. \end{align*} Thus \begin{align*} p_{U,W\mid C}(x_U,x_W\mid x_C)=q_U(x_U\mid x_C)q_W(x_W\mid x_C). \end{align*} This product form is exactly the conditional independence statement $X_U\perp\!\!\!\perp X_W\mid X_C$ for the conditional kernels induced by the density. [/guided] [/step] [step:Marginalize the separated sides to obtain independence of $X_A$ and $X_B$] Because $A\subset U$ and $B\subset W$, conditional independence of $X_U$ and $X_W$ given $X_C$ implies conditional independence of the subcollections $X_A$ and $X_B$ given $X_C$. Explicitly, let $D_A\in\mathcal E_A$ and $D_B\in\mathcal E_B$. For $\mathbb P_{X_C}$-almost every $x_C$, integrate the conditional product density over the coordinates in $U\setminus A$ and $W\setminus B$ with respect to $\mu_{U\setminus A}$ and $\mu_{W\setminus B}$. Tonelli's theorem applies because the conditional densities are non-negative. The result is \begin{align*} \mathbb P(X_A\in D_A,\ X_B\in D_B\mid X_C=x_C)=\mathbb P(X_A\in D_A\mid X_C=x_C)\mathbb P(X_B\in D_B\mid X_C=x_C). \end{align*} This is precisely $X_A\perp\!\!\!\perp X_B\mid X_C$. [/step]

Prerequisites (0/2 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Explore Further

Graph Criterion for Regular Maps of Affine Varieties Theorem #9423 Moralized Ancestral Graph Criterion Theorem #9670 Harmonicity of the Brownian Dirichlet Solution Brownian Motion Restricted Isometry Property Implies Injectivity on Sparse Vectors Probability & Statistics Conditional Convergence Theorems Conditional Expectation Basic Properties of Cumulative Distribution Functions Probability & Statistics Chi-Squared Distribution of the Residual Sum of Squares Probability & Statistics Convexity of the Log Moment Generating Function Probability & Statistics Pointwise Asymptotic Bias and Variance of the Kernel Density Estimator Probability & Statistics Reflection Principle for Brownian Motion Brownian Motion Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.