Hammersley-Clifford Theorem for Positive Finite Random Fields

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We prove the two implications separately. A clique factorization implies the local Markov property because, after conditioning on all variables except $X_v$, every factor not involving $v$ cancels and every factor involving $v$ depends only on $x_v$ and the neighbouring variables. Conversely, we take logarithms, choose a reference configuration, and decompose $\log \pi$ into interaction terms indexed by subsets of $V$ using the finite subset-lattice difference formula. The local Markov property forces every interaction term supported on a nonclique to vanish, so only clique interactions remain; exponentiating those terms gives the desired factorization. [/proofplan] [step:Derive the local conditional law from a clique factorization] Assume first that there are clique potentials \begin{align*} \psi_C:\prod_{u \in C} S_u \to [0,\infty) \end{align*} and a constant $Z>0$ such that \begin{align*} \pi(x)=\frac{1}{Z}\prod_{C \in \mathcal C(G)} \psi_C(x_C) \end{align*} for every $x \in \Omega$. Since $\pi(x)>0$ for every $x \in \Omega$, the product on the right is positive for every configuration $x$. In particular, if $C \in \mathcal C(G)$ and $z_C \in \prod_{u \in C}S_u$, then $\psi_C(z_C)>0$: choose any extension $z \in \Omega$ of $z_C$, and a zero value of $\psi_C(z_C)$ would force the whole product for $z$ to vanish. Fix $v \in V$. For a configuration $y_{V\setminus\{v\}} \in \prod_{u \in V\setminus\{v\}}S_u$ with positive conditioning probability, and for each $t \in S_v$, define $y_t \in \Omega$ by $(y_t)_u = y_u$ for $u \in V\setminus\{v\}$ and $(y_t)_v = t$. The conditional mass function of $X_v$ given $X_{V\setminus\{v\}}=y_{V\setminus\{v\}}$ is \begin{align*} \mathbb P(X_v=a \mid X_{V\setminus\{v\}}=y_{V\setminus\{v\}}) = \frac{\pi(y_a)}{\sum_{b \in S_v}\pi(y_b)}. \end{align*} Substituting the factorization gives \begin{align*} \mathbb P(X_v=a \mid X_{V\setminus\{v\}}=y_{V\setminus\{v\}})=\frac{\prod_{C \in \mathcal C(G)}\psi_C((y_a)_C)}{\sum_{b \in S_v}\prod_{C \in \mathcal C(G)}\psi_C((y_b)_C)}. \end{align*} If $v \notin C$, then $(y_a)_C=(y_b)_C$ for every $a,b \in S_v$, so the factor $\psi_C((y_a)_C)$ is independent of the candidate value of $X_v$ and cancels between numerator and denominator. Therefore \begin{align*} \mathbb P(X_v=a \mid X_{V\setminus\{v\}}=y_{V\setminus\{v\}})=\frac{\prod_{C \in \mathcal C(G):\, v \in C}\psi_C((y_a)_C)}{\sum_{b \in S_v}\prod_{C \in \mathcal C(G):\, v \in C}\psi_C((y_b)_C)}. \end{align*} Every clique $C$ containing $v$ satisfies $C \subset \{v\}\cup N(v)$, because each other vertex in $C$ must be adjacent to $v$. Hence the right-hand side depends on $y_{V\setminus\{v\}}$ only through $y_{N(v)}$. Thus the conditional distribution of $X_v$ given $X_{V\setminus\{v\}}$ depends only on $X_{N(v)}$, which is exactly \begin{align*} X_v \perp X_{V\setminus(\{v\}\cup N(v))}\mid X_{N(v)}. \end{align*} [guided] The point of this direction is that clique factors separate into two types: those that see $v$ and those that do not. Fix a vertex $v \in V$ and fix values of all variables except $X_v$. For each possible value $a \in S_v$, let $y_a \in \Omega$ denote the full configuration obtained by inserting $a$ at coordinate $v$ and keeping the other coordinates equal to the fixed vector $y_{V\setminus\{v\}}$. Because the configuration space is finite and $\pi$ is positive, the [conditional probability](/page/Conditional%20Probability) is computed by the usual finite conditional mass formula: \begin{align*} \mathbb P(X_v=a \mid X_{V\setminus\{v\}}=y_{V\setminus\{v\}}) = \frac{\pi(y_a)}{\sum_{b \in S_v}\pi(y_b)}. \end{align*} Using the assumed clique factorization, this becomes \begin{align*} \mathbb P(X_v=a \mid X_{V\setminus\{v\}}=y_{V\setminus\{v\}})=\frac{\prod_{C \in \mathcal C(G)}\psi_C((y_a)_C)}{\sum_{b \in S_v}\prod_{C \in \mathcal C(G)}\psi_C((y_b)_C)}. \end{align*} Now separate the product over cliques into cliques containing $v$ and cliques not containing $v$. If $v \notin C$, then the restriction $(y_a)_C$ does not involve the coordinate $v$, so $(y_a)_C=(y_b)_C$ for all $a,b \in S_v$. Thus the factor $\psi_C((y_a)_C)$ is the same in every term of the denominator and also in the numerator. By the positivity observation at the start of the step, these common factors are strictly positive and can be cancelled. We obtain \begin{align*} \mathbb P(X_v=a \mid X_{V\setminus\{v\}}=y_{V\setminus\{v\}})=\frac{\prod_{C \in \mathcal C(G):\, v \in C}\psi_C((y_a)_C)}{\sum_{b \in S_v}\prod_{C \in \mathcal C(G):\, v \in C}\psi_C((y_b)_C)}. \end{align*} What variables can appear in the remaining expression? If $C$ is a clique and $v \in C$, then every $u \in C\setminus\{v\}$ is adjacent to $v$. Therefore $C \subset \{v\}\cup N(v)$. Hence each remaining factor depends only on the candidate value $a$ and on the already fixed neighbour values $y_{N(v)}$. It does not depend on the values of vertices outside $\{v\}\cup N(v)$. Therefore the conditional law of $X_v$ given all other variables is a function only of $X_{N(v)}$, which is precisely the local Markov property: \begin{align*} X_v \perp X_{V\setminus(\{v\}\cup N(v))}\mid X_{N(v)}. \end{align*} [/guided] [/step] [step:Decompose the log mass function into subset interactions] Assume now that $\pi$ is positive and satisfies the local Markov property. Define \begin{align*} f:\Omega \to \mathbb R,\qquad x \mapsto \log \pi(x). \end{align*} Choose a reference configuration $x_0 \in \Omega$. For each subset $A \subset V$ and each $x_A \in \prod_{v \in A}S_v$, define the mixed interaction \begin{align*} \Phi_A:\prod_{v \in A}S_v \to \mathbb R \end{align*} by \begin{align*} \Phi_A(x_A) := \sum_{B \subset A}(-1)^{|A|-|B|} f(\widetilde{x}_B), \end{align*} where $\widetilde{x}_B \in \Omega$ is the configuration defined by $(\widetilde{x}_B)_u = x_u$ for $u \in B$ and $(\widetilde{x}_B)_u = (x_0)_u$ for $u \in V\setminus B$. Then for every $x \in \Omega$, \begin{align*} f(x)=\sum_{A \subset V}\Phi_A(x_A). \end{align*} Indeed, substituting the definition of $\Phi_A$ and collecting the coefficient of each $f(\widetilde{x}_B)$ gives \begin{align*} \sum_{A \subset V}\Phi_A(x_A) = \sum_{B \subset V} f(\widetilde{x}_B)\sum_{A:\,B \subset A \subset V}(-1)^{|A|-|B|}. \end{align*} For fixed $B \subset V$, the inner sum is \begin{align*} \sum_{D \subset V\setminus B}(-1)^{|D|}. \end{align*} This equals $1$ when $B=V$ and equals $0$ when $B\ne V$, because the nonempty finite set $V\setminus B$ has equally many even and odd subsets. Therefore only the term $B=V$ remains, and since $\widetilde{x}_V=x$, we obtain \begin{align*} \sum_{A \subset V}\Phi_A(x_A)=f(x). \end{align*} [/step] [step:Translate the local Markov property into a neighbour dependence identity] Fix a vertex $u \in V$. Let \begin{align*} R_u:=V\setminus(\{u\}\cup N(u)) \end{align*} be the set of vertices that are neither $u$ nor neighbours of $u$. The local Markov property says \begin{align*} X_u \perp X_{R_u}\mid X_{N(u)}. \end{align*} Because $\pi$ is strictly positive on the finite configuration space, every event determined by a complete assignment of coordinates has positive probability. Thus the local Markov property is equivalent to saying that, for every $n \in \prod_{w \in N(u)}S_w$, every $r \in \prod_{w \in R_u}S_w$, and every $a \in S_u$, the conditional mass \begin{align*} \mathbb P(X_u=a\mid X_{N(u)}=n,\,X_{R_u}=r) \end{align*} is independent of $r$. For $a \in S_u$, $n \in \prod_{w \in N(u)}S_w$, and $r \in \prod_{w \in R_u}S_w$, write $(a,n,r)\in\Omega$ for the unique configuration whose $u$-coordinate is $a$, whose $N(u)$-coordinates are $n$, and whose $R_u$-coordinates are $r$. The finite conditional probability formula gives \begin{align*} \mathbb P(X_u=a\mid X_{N(u)}=n,\,X_{R_u}=r) = \frac{\pi(a,n,r)}{\sum_{c \in S_u}\pi(c,n,r)}. \end{align*} Therefore, for all $a,b \in S_u$, all $n \in \prod_{w \in N(u)}S_w$, and all $r,r' \in \prod_{w \in R_u}S_w$, the independence of the conditional mass from $r$ gives \begin{align*} \frac{\mathbb P(X_u=a\mid X_{N(u)}=n,\,X_{R_u}=r)}{\mathbb P(X_u=b\mid X_{N(u)}=n,\,X_{R_u}=r)} = \frac{\mathbb P(X_u=a\mid X_{N(u)}=n,\,X_{R_u}=r')}{\mathbb P(X_u=b\mid X_{N(u)}=n,\,X_{R_u}=r')}. \end{align*} Substituting the displayed conditional formula and cancelling the positive normalizing denominators yields \begin{align*} \frac{\pi(a,n,r)}{\pi(b,n,r)} = \frac{\pi(a,n,r')}{\pi(b,n,r')}. \end{align*} Taking logarithms gives \begin{align*} f(a,n,r)-f(b,n,r)=f(a,n,r')-f(b,n,r'). \end{align*} Thus, for fixed $a,b \in S_u$ and fixed neighbour configuration $n$, the logarithmic difference obtained by changing the $u$-coordinate from $b$ to $a$ is independent of all coordinates in $R_u$. [/step] [step:Show that every nonclique interaction vanishes] Let $A \subset V$ be a nonclique. Then there exist distinct vertices $u,w \in A$ such that $\{u,w\}\notin E$. In particular $w \in R_u$. We prove that $\Phi_A=0$. Fix $x_A \in \prod_{v \in A}S_v$. Partition the subsets $B \subset A$ according to whether they contain $u$. For each $D \subset A\setminus\{u\}$, pair the term $B=D\cup\{u\}$ with the term $B=D$. By the definition of $\Phi_A$, \begin{align*} \Phi_A(x_A) = \sum_{D \subset A\setminus\{u\}}(-1)^{|A|-|D|-1} \bigl[f(\widetilde{x}_{D\cup\{u\}})-f(\widetilde{x}_D)\bigr]. \end{align*} For each $D \subset A\setminus\{u\}$, the two configurations $\widetilde{x}_{D\cup\{u\}}$ and $\widetilde{x}_D$ agree on every coordinate except possibly $u$. The difference \begin{align*} f(\widetilde{x}_{D\cup\{u\}})-f(\widetilde{x}_D) \end{align*} therefore depends, by the identity from the previous step, only on the value at $u$ and on the coordinates of $D \cap N(u)$, not on whether the non-neighbour coordinate $w$ belongs to $D$. Pair now each $D \subset A\setminus\{u,w\}$ with $D\cup\{w\}$. The bracketed logarithmic difference is the same for the two paired subsets, while the signs differ by a factor of $-1$. Hence the two contributions cancel. Summing over all such pairs gives \begin{align*} \Phi_A(x_A)=0. \end{align*} Since $x_A$ was arbitrary, $\Phi_A$ is identically zero for every nonclique $A$. [guided] This is the key step of the converse. The decomposition \begin{align*} f(x)=\sum_{A \subset V}\Phi_A(x_A) \end{align*} contains one interaction term for every subset of vertices. To prove clique factorization, we must show that a term indexed by a nonclique cannot survive. Let $A \subset V$ be a nonclique. By definition, there are distinct vertices $u,w \in A$ with no edge between them: \begin{align*} \{u,w\}\notin E. \end{align*} Thus $w$ is not a neighbour of $u$, so \begin{align*} w \in R_u:=V\setminus(\{u\}\cup N(u)). \end{align*} Fix a configuration $x_A \in \prod_{v \in A}S_v$. For every subset $B\subset A$, let $\widetilde{x}_B\in\Omega$ denote the configuration whose coordinates in $B$ are taken from $x_A$ and whose coordinates outside $B$ are taken from the reference configuration $x_0$. The mixed interaction is \begin{align*} \Phi_A(x_A) := \sum_{B \subset A}(-1)^{|A|-|B|} f(\widetilde{x}_B). \end{align*} We first group together the terms that differ only by the presence of $u$. For every subset $D \subset A\setminus\{u\}$, the paired subsets are $D$ and $D\cup\{u\}$, so their contribution is \begin{align*} (-1)^{|A|-|D|-1} \bigl[f(\widetilde{x}_{D\cup\{u\}})-f(\widetilde{x}_D)\bigr]. \end{align*} Therefore \begin{align*} \Phi_A(x_A) = \sum_{D \subset A\setminus\{u\}}(-1)^{|A|-|D|-1} \bigl[f(\widetilde{x}_{D\cup\{u\}})-f(\widetilde{x}_D)\bigr]. \end{align*} Now apply the neighbour-dependence identity obtained from the local Markov property. The two configurations $\widetilde{x}_{D\cup\{u\}}$ and $\widetilde{x}_D$ agree everywhere except possibly at the coordinate $u$. Thus the difference \begin{align*} f(\widetilde{x}_{D\cup\{u\}})-f(\widetilde{x}_D) \end{align*} is a logarithmic change caused by changing $X_u$ while holding all other coordinates fixed. The local Markov property says that such a logarithmic change may depend on the neighbours of $u$, but it cannot depend on any non-neighbour of $u$. Since $w$ is a non-neighbour of $u$, the displayed difference is unchanged when we switch the $w$-coordinate between its reference value $(x_0)_w$ and its selected value $x_w$. Now pair every subset $D \subset A\setminus\{u,w\}$ with $D\cup\{w\}$. The bracket \begin{align*} f(\widetilde{x}_{D\cup\{u\}})-f(\widetilde{x}_D) \end{align*} is equal to the corresponding bracket \begin{align*} f(\widetilde{x}_{D\cup\{u,w\}})-f(\widetilde{x}_{D\cup\{w\}}) \end{align*} because changing whether $w$ is active does not affect the logarithmic change in the $u$-coordinate. But the signs of the two paired contributions are opposite: \begin{align*} (-1)^{|A|-|D|-1} = - (-1)^{|A|-|D\cup\{w\}|-1}. \end{align*} Therefore each pair cancels exactly. Since all subsets of $A\setminus\{u\}$ are partitioned into such pairs, the whole alternating sum is zero: \begin{align*} \Phi_A(x_A)=0. \end{align*} Because the chosen $x_A$ was arbitrary, the interaction function $\Phi_A$ is identically zero on $\prod_{v \in A}S_v$. [/guided] [/step] [step:Exponentiate the clique interactions to obtain the factorization] From the previous steps, every nonzero interaction term is supported either on the empty set or on a nonempty clique. Since $\mathcal C(G)$ denotes the nonempty cliques, for every $x \in \Omega$ we have \begin{align*} \log \pi(x)=f(x)=\Phi_\varnothing+\sum_{C \in \mathcal C(G)}\Phi_C(x_C). \end{align*} Define, for every $C \in \mathcal C(G)$, the function \begin{align*} \psi_C:\prod_{v \in C}S_v \to (0,\infty), \qquad z_C \mapsto \exp(\Phi_C(z_C)). \end{align*} Define the normalizing constant \begin{align*} Z:=\exp(-\Phi_\varnothing)>0. \end{align*} Then, for every $x\in\Omega$, \begin{align*} \pi(x) = \exp(\Phi_\varnothing)\prod_{C \in \mathcal C(G)}\exp(\Phi_C(x_C)) = \frac{1}{Z}\prod_{C \in \mathcal C(G)}\psi_C(x_C). \end{align*} Thus $\pi$ factorizes over the nonempty cliques of $G$. The functions $\psi_C$ are strictly positive, hence in particular nonnegative. Combining this converse implication with the first step proves the equivalence. [/step]

What brings you to Androma?

Start with a route through the knowledge graph.