[proofplan]
We first relabel the vertices along a topological order, which is legitimate because every finite directed acyclic graph has such an order. The density chain rule writes the joint density as a product of one-step conditional densities given all earlier variables; the local Markov property then removes all earlier non-parent variables from each conditioning set. For the converse, we start from the product factorization and compute the conditional density of $X_j$ given its parents and non-descendants; by integrating out descendants in reverse topological order, all descendant factors normalize to one, leaving a conditional law that depends only on $X_{\operatorname{pa}(j)}$.
[/proofplan]
custom_env
admin
[step:Relabel the vertices in a topological order]By [citetheorem:9665], the finite DAG $G$ admits a topological order. Relabel the vertices according to such an order. Thus, whenever $i\to j$ is an edge of $G$, we have $i<j$.
For each $j\in V$, define the earlier-vertex set
\begin{align*}
H_j:=\{1,\dots,j-1\}.
\end{align*}
Since the order is topological, every parent of $j$ is earlier, so
\begin{align*}
\operatorname{pa}(j)\subset H_j.
\end{align*}
Also, no earlier vertex can be a descendant of $j$, because a directed path from $j$ to an earlier vertex would contradict the topological order. Hence
\begin{align*}
H_j\setminus \operatorname{pa}(j)\subset \operatorname{nd}(j)\setminus \operatorname{pa}(j).
\end{align*}[/step]
custom_env
admin
[guided]The only purpose of this step is to put the graph into an order compatible with the arrows. By [citetheorem:9665], every finite DAG has a topological order. After relabeling the vertices in that order, every directed edge points from a smaller label to a larger label.
For a fixed vertex $j$, define
\begin{align*}
H_j:=\{1,\dots,j-1\}.
\end{align*}
This is the set of variables that occur before $X_j$ in the chain rule. Because all arrows into $j$ must come from earlier vertices, we have
\begin{align*}
\operatorname{pa}(j)\subset H_j.
\end{align*}
Moreover, if some $i\in H_j$ were a descendant of $j$, then there would be a directed path from $j$ to $i$. Along a topological order, directed paths strictly increase labels, so this would force $j<i$, contradicting $i<j$. Therefore every earlier non-parent is a non-descendant non-parent:
\begin{align*}
H_j\setminus \operatorname{pa}(j)\subset \operatorname{nd}(j)\setminus \operatorname{pa}(j).
\end{align*}
This containment is the graph-theoretic bridge between the analytic chain rule and the local Markov property.[/guided]
custom_env
admin
[step:Use the chain rule and local Markov property to obtain the product factorization]
For each $j\in V$, let $p(x_j\mid x_{H_j})$ denote a version of the conditional density or mass function of $X_j$ given $X_{H_j}$. The ordinary chain rule for densities or mass functions gives
\begin{align*}
p(x_1,\dots,x_n)=\prod_{j=1}^n p(x_j\mid x_{H_j})
\end{align*}
for $\mu$-almost every $(x_1,\dots,x_n)$, with the convention that $p(x_1\mid x_{H_1})=p(x_1)$ because $H_1=\varnothing$.
Fix $j\in V$. Since
\begin{align*}
H_j\setminus\operatorname{pa}(j)\subset \operatorname{nd}(j)\setminus\operatorname{pa}(j),
\end{align*}
the local Markov property implies
\begin{align*}
X_j\perp\!\!\!\perp X_{H_j\setminus\operatorname{pa}(j)}\mid X_{\operatorname{pa}(j)}.
\end{align*}
Therefore the conditional density of $X_j$ given $X_{H_j}$ may be chosen to equal the conditional density of $X_j$ given $X_{\operatorname{pa}(j)}$:
\begin{align*}
p(x_j\mid x_{H_j})=p(x_j\mid x_{\operatorname{pa}(j)})
\end{align*}
for the relevant marginal almost every conditioning value. Substituting these versions into the chain-rule factorization yields
\begin{align*}
p(x_1,\dots,x_n)=\prod_{j=1}^n p(x_j\mid x_{\operatorname{pa}(j)})
\end{align*}
for $\mu$-almost every $(x_1,\dots,x_n)$.
[/step]
custom_env
admin
[step:Compute the conditional law under the factorization]Assume conversely that
\begin{align*}
p(x_1,\dots,x_n)=\prod_{k=1}^n q_k(x_k\mid x_{\operatorname{pa}(k)})
\end{align*}
for $\mu$-almost every $x=(x_1,\dots,x_n)$, where each $q_k(x_k\mid x_{\operatorname{pa}(k)})$ is a version of the conditional density or mass function of $X_k$ given $X_{\operatorname{pa}(k)}$.
Fix $j\in V$. Define
\begin{align*}
P_j:=\operatorname{pa}(j),\qquad N_j:=\operatorname{nd}(j)\setminus\operatorname{pa}(j),\qquad D_j:=V\setminus(\{j\}\cup P_j\cup N_j).
\end{align*}
The set $D_j$ is exactly the set of descendants of $j$. Fix values $x_{P_j}$ and $x_{N_j}$ for which the conditional density of $X_j$ given $(X_{P_j},X_{N_j})$ is defined and the denominator is positive. For such values, this conditional density is proportional, as a function of $x_j$, to
\begin{align*}
\int_{\prod_{k\in D_j}S_k} \prod_{k=1}^n q_k(x_k\mid x_{\operatorname{pa}(k)})\,d\left(\bigotimes_{k\in D_j}\mu_k\right)(x_{D_j}).
\end{align*}
If $k\in N_j$, then no parent of $k$ lies in $\{j\}\cup D_j$, because any edge from $j$ or from a descendant of $j$ into $k$ would make $k$ a descendant of $j$. Hence the factors indexed by $N_j$ are constant with respect to $x_j$ and $x_{D_j}$. The factor indexed by $j$ is
\begin{align*}
q_j(x_j\mid x_{P_j}).
\end{align*}
The remaining factors indexed by $D_j$ integrate to one by repeated integration in reverse topological order, since each factor $q_k(x_k\mid x_{\operatorname{pa}(k)})$ is a conditional density or mass function in the variable $x_k$ with respect to $\mu_k$.
Thus the conditional density of $X_j$ given $(X_{P_j},X_{N_j})$ is
\begin{align*}
q_j(x_j\mid x_{P_j})
\end{align*}
for almost every conditioning value. This conditional law depends on $x_{P_j}$ and not on $x_{N_j}$.[/step]
custom_env
admin
[guided]We now prove the converse by looking directly at conditional densities. Assume the factorization
\begin{align*}
p(x_1,\dots,x_n)=\prod_{k=1}^n q_k(x_k\mid x_{\operatorname{pa}(k)})
\end{align*}
holds for $\mu$-almost every $x$, where each $q_k$ is a conditional density or mass function for $X_k$ given its parents.
Fix a vertex $j$. We separate the other vertices into three classes:
\begin{align*}
P_j:=\operatorname{pa}(j),\qquad N_j:=\operatorname{nd}(j)\setminus\operatorname{pa}(j),\qquad D_j:=V\setminus(\{j\}\cup P_j\cup N_j).
\end{align*}
Here $P_j$ is the parent set, $N_j$ is the non-parent non-descendant set, and $D_j$ is the descendant set. To prove the local Markov property, we must show that after conditioning on $X_{P_j}$, adding $X_{N_j}$ to the conditioning information does not change the conditional law of $X_j$.
Fix conditioning values $x_{P_j}$ and $x_{N_j}$ at which the relevant regular conditional densities are defined and the conditioning density is positive. At such values, the conditional density of $X_j$ given $(X_{P_j},X_{N_j})$ is obtained from the joint density by integrating out the descendant coordinates $x_{D_j}$. Thus it is proportional in $x_j$ to
\begin{align*}
\int_{\prod_{k\in D_j}S_k} \prod_{k=1}^n q_k(x_k\mid x_{\operatorname{pa}(k)})\,d\left(\bigotimes_{k\in D_j}\mu_k\right)(x_{D_j}).
\end{align*}
Now inspect which factors can involve $x_j$. The factor indexed by $j$ is exactly
\begin{align*}
q_j(x_j\mid x_{P_j}).
\end{align*}
If $k\in N_j$, then $k$ is not a descendant of $j$. No parent of $k$ can be $j$ or a descendant of $j$, because an edge from $j$ or from a descendant of $j$ into $k$ would create a directed path from $j$ to $k$. Therefore the factors indexed by $N_j$ are constant with respect to both $x_j$ and the descendant variables $x_{D_j}$.
It remains to handle the factors indexed by descendants. Because the graph is topologically ordered, we may integrate descendant variables in reverse topological order. When integrating the coordinate $x_k$ for a descendant $k$, all variables in $\operatorname{pa}(k)$ are already fixed or have not been integrated out in a way that affects the normalization, and $q_k(\cdot\mid x_{\operatorname{pa}(k)})$ is a conditional density or mass function. Hence
\begin{align*}
\int_{S_k} q_k(x_k\mid x_{\operatorname{pa}(k)})\,d\mu_k(x_k)=1
\end{align*}
for the relevant parent values. Repeating this for all descendants shows that the descendant factors contribute the normalizing value one.
Consequently, after normalization, the conditional density of $X_j$ given $(X_{P_j},X_{N_j})$ is
\begin{align*}
q_j(x_j\mid x_{P_j}).
\end{align*}
This expression depends only on the parent values $x_{P_j}$ and not on the non-descendant non-parent values $x_{N_j}$.[/guided]
custom_env
admin
[step:Conclude the local Markov property]
The preceding computation shows that, for each $j\in V$, there are versions of the conditional laws satisfying
\begin{align*}
\mathbb P(X_j\in A\mid X_{P_j},X_{N_j})=\mathbb P(X_j\in A\mid X_{P_j})
\end{align*}
for every measurable set $A\in\mathcal S_j$ and for almost every value of $(X_{P_j},X_{N_j})$. By the definition of conditional independence through conditional laws, this is precisely
\begin{align*}
X_j\perp\!\!\!\perp X_{N_j}\mid X_{P_j}.
\end{align*}
Since $P_j=\operatorname{pa}(j)$ and $N_j=\operatorname{nd}(j)\setminus\operatorname{pa}(j)$, the local Markov property holds for every $j\in V$. This proves the converse and completes the theorem.
[/step]