Equivalence in Bayesian Networks

$ \newcommand{\nb}[1]{\text{ne}(#1)} \newcommand{\pa}[1]{\text{pa}(#1)} \newcommand{\ch}[1]{\text{ch}(#1)} \newcommand{\de}[1]{\text{de}(#1)} \newcommand{\an}[1]{\text{an}(#1)} \newcommand{\nd}[1]{\text{nd}(#1)} \newcommand{\can}[1]{\overline{\text{an}}(#1)} \newcommand{\ind}{⫫} \newcommand{\dsep}{\perp_d} \newcommand{\msep}{\perp_m} $

In this lecture, we discuss the classes of graphs that are equivalent under d-separation. We discuss how can two graphs be efficiently tested for equivalence, and how can we represent the class of all equivalent graphs.

Markov Equivalence

Under the semantics of d-separation, an acyclic directed graph can be seen as a compact representation of an independence relation. As we discussed previously, not every independence relation can be represented in such a way (because d-separation satisfy properties that not every distribution satisfies). Pearl advocated independence (and their corresponding graphical representation) as a more fundamental knowledge than probability. In this view, independences may exist without any reference to a probability function. Indeed, the d-separation semantics and its variants can be used to represent independences in uncertainty frameworks such as Dempster-Shafer theory and credal networks (interval-valued probability models). Notably, an independence relation might admit many graphical representations. This is clear from the definition of d-separation in serial and divergent connections. The following discussion aims at characterizing the classes of equivalent structures (those that represent the same independence relation), and studying their consequences on the representation of independence as an isolated concept.

Two acyclic directed graphs $G$ and $H$ are equivalent if their node set is the same, and $$ (\mathcal{X} \dsep \mathcal{Y} \mid \mathcal{Z})_G \Leftrightarrow (\mathcal{X} \dsep \mathcal{Y} \mid \mathcal{Z})_H $$ for every subset of nodes $\mathcal{X}$, $\mathcal{Y}$ and $\mathcal{Z}$.

Equivalence is also called I-equivalence (from independence-equivalence), and Markov equivalence. The latter is due to the fact that the local and global Markov properties are equivalent, hence, two graphs are equivalent if and only if the set of Markov properties of one of the graphs is satisfied by the other one. The three digraphs in the figure below are equivalent (try to see why this is true).

An equivalence class is a set of equivalent acyclic digraphs. Of course, all digraphs in a equivalence class must have the same node set. Given the equivalence between the Markov properties encoded by a graph and the factorization of a distribution, the graphs in an equivalence class can be seen as different parametrizations of the same probability distribution.

The size of equivalence classes

How many equivalent structures are there? The figure below shows all structures over 3 nodes with the corresponding equivalence classes.

Equivalence classes for 3-node structures

According to the figure, there are 25 structures segmented in 11 equivalence classes as follows: 4 singletons, 3 classes with 2 digraphs, 3 classes with 3 digraphs and 1 class with 6 structures. One can also notice that the sparser the structure the smaller the equivalence classes. What about equivalence classes for structures of order $n$? The number of equivalence classes is upper bounded by the number of structures (i.e., acyclic digraphs), which satisfies Robinson’s formula:¹ $$ \#\text{DAG}(n) = \sum_{i=1}^n (-1)^{i+1} {n \choose i} 2^{i(n-i)} \#\text{DAG}(n-i) \, . $$ This number grows extremely quickly (superexponential). For instance:

n	$\#\text{DAG}(n)$
1	1
2	3
3	25
$\vdots$	$\vdots$
8	783 702 329 343
9	1 213 442 454 842 881
10	4 175 098 976 430 598 143

While there is not a known formula for the number of equivalence classes, it is lower bounded by the number of equivalence classes of size 1 (i.e., which have a single structure), which is given by the following formula.² $$ \#\text{EQ1}(n) = \sum_{i=1}^n (-1)^{i+1} {n \choose i} (2^{n-i}-(n-i))^i \#\text{EQ1}(n-i) \, . $$ The term $(2^{n-i}-(n-i))^i$ approaches $2^{i(n-i)}$ as $n \rightarrow \infty$, thus placing this sequence in the same order of the count of acyclic digraphs. Gillispse and Erlman have computed the number of equivalence classes and their relative size for up to $n=10$ and found that:

$n$	$\#\text{EQUIV}(n)$	$\#\text{EQUIV}(n)/\#\text{DAG}(n)$
1	1	1.0
2	2	0.667
3	11	0.440
$\vdots$	$\vdots$	$\vdots$
8	212 133 402 500	0.271
9	326 266 056 291 213	0.269
10	1 118 902 054 495 975 141	0.268

The analysis above suggests that the number of equivalence classes converges to a fraction of the total number of structures, and therefore grows superexponentially with the number of nodes. This reasoning should convince you that any simplistic approach (such as listing DAGs or equivalence classes) to structure learning is infeasible even for small domains. We therefore need a cleverer way to identify, represent and manipulate equivalence classes.

Skeleton

Consider a graph $G$ with an arc $X \rightarrow Y$. Any equivalent graph $H$ must have either the same arc or a reversed arc $Y \rightarrow X$, since two variables connected by an arc are d-connected (by some set), and any two variables not connected by an arc can be d-separated by some set of variables (possibly the empty set). This implies that any two equivalent graphs must have the same underlying undirected structure. This is formalized by the concept of a skeleton:

The skeleton of a directed graph $G=(V,E)$ is the undirected graph $H=(V,F)$ such that $X \rightarrow Y \in E$ if and only if $X-Y \in F$.

To put more simply, the skeleton of a digraph is obtaining by ignoring the direction of the arcs, turning them into undirected edges. The graph in the figure below is the skeleton of all three directed graphs in Figure 1.

The highlighted edges indicate a convergent connection, will be important to characterize equivalence classes, as we discuss next.

Immorality

It follows from our previous discussion that candidate equivalent graphs must have the same skeleton. Now consider arbitrary 3-node graphs $G$ and $H$. If $G$ and $H$ are complete, then they encode no d-separations so they must be equivalent. If $G$ is a serial connection and $H$ is a divergent connection then they also encode the same set of d-separations. Last, if $G$ is either a serial or divergent connection and $H$ is a convergent connection, then the two graphs have the same skeleton but represent different d-separation relations (and are thus not equivalent). This is formalized by the following concept:

An immorality is a triple of nodes $X,Y,Z$ such that $X \rightarrow Y \leftarrow Z$ is a convergent connection, and $X$ and $Z$ are non-adjacent (i.e., there is no arc connecting $X$ and $Z$).

An immorality is sometimes also called a v-structure (although this name is also used to describe a general convergent connection). By the previous reasoning, two equivalent graphs must have the same set of immoralities lest we can find a triple (one that is a serial or divergent connection in one and a convergent connection in the other) that induces different d-separations. Thus we have that two graphs are equivalent only if they have the same skeleton and the same set of immoralities. Verma and Pearl showed that the converse is also true.

Two acyclic directed graphs are equivalent if and only if they have the same skeleton and the same set of immoralities.

The digraphs in Figure 1 all satisfy the theorem above. The arcs of the single common immorality $B \rightarrow D \leftarrow C$ appear highlighted in their skeleton. The set of acyclic digraphs can be partitioned into classes of equivalent graphs. According to the previous theorem, each part can be uniquely represented by the skeleton and the list of immoralities. Notice that, as for d-separation, the equivalence of graphs is a graphical property and can be tested efficiently (in linear time in the number of arcs).

Essential graphs

The previous theorem allows to easily and efficiently identify equivalence. We can use it to also obtain an efficient and concise representation of equivalence classes, by associating a digraph to every class of equivalent structures.

The essential graph of a class $\mathcal{G}$ of equivalent acyclic digraphs $G=(V_G,E_G)$ is the digraph $H=(V_G,E_H)$ where $E_H = \cup_{G \in \mathcal{G}} E_G$, that is, $E_H$ contains an arc $X \rightarrow Y$ if and only if there is a graph $G$ in $\mathcal{G}$ with that arc.

Since all the graphs in an equivalence class have the same skeleton and the same set of immoralities, the essential graph is unique.

Essential graphs of the equivalence classes of 3-node structures

Partially directed graphs

Given an essential graph, we can infer d-separations by considering any underlying acyclic digraph represented in the class. This is performed by considering at most one arc per pair of nodes in a way that does not creates new immoralities. This procedure is best captured by a similar graphical representation device that contains both undirected and directed edges. Call any symmetric relation $X \rightarrow Y$ iff $Y \rightarrow X$ an undirected edge, and any asymmetric relation $X \rightarrow Y$ iff $X \not \rightarrow Y$ a directed edge. A partially directed graph is a graph that contains directed and undirected edges.

The partially directed graph of an equivalence class $\mathcal{G}$ is the (partially) directed graph that has the same skeleton as a graph in $\mathcal{G}$ and has a directed arc $X \rightarrow Y$ if and only this arc appears in all graphs in the class.

The figure below shows the partially digraphs of 3-node equivalence classes.

Partially digraph representation of the equivalence classes of 3-node structures

The partially digraph represents possible orientations of the arcs in an equivalence class. Any orientation of the undirected graphs that is topologically consistent and does not introduce new immoralities leads to a graph that represents the same set of d-separations. A partially directed graph is a chain graph if there is no cycle that uses only directed edges. The partially directed graph of an equivalence class is a chain graph. It is possible to modify d-separation to work directly on chain graphs. We do this by extending the definitions of blocking connections to consider partially directed triples. The figure bellow shows all possible connections (up to isomorphisms).

Possible connections in partially digraphs

The three connections on the top include only directed graphs and have the same semantics as d-separation in directed graphs. The connections on the bottom have the same semantics as a serial connection: the connection is blocked by the middle node and active (unblocked) otherwise. To see why this is true, consider a valid orientation of any of these connections. For example, the leftmost partially directed connection allow two possible orientations of the undirected edge, transforming it either into a serial or a divergent connection. In either case, d-separation is the same. The partially directed connection in the middle allows only one orientation, else we create an unwanted immorality. Hence, the connection is equivalent to a serial connection and be read off as such. Finally, the undirected connection can be transformed in either a serial connection or a divergent connection; again, the semantics is the same.

R.W. Robinson, Counting labeled acyclic digraphs, New Directions in the Theory of Graphs, 1973. ↩︎
B. Steinsky, Enumeration of labeled chain graphs and labeled essential directed acyclic graphs, Discrete Mathematics 270, 2003. ↩︎

Last updated on Sep 24, 2020