# Equivalence in Bayesian Networks

$$\newcommand{\nb}{\text{ne}(#1)} \newcommand{\pa}{\text{pa}(#1)} \newcommand{\ch}{\text{ch}(#1)} \newcommand{\de}{\text{de}(#1)} \newcommand{\an}{\text{an}(#1)} \newcommand{\nd}{\text{nd}(#1)} \newcommand{\indeg}{\text{indeg}(#1)} \newcommand{\outdeg}{\text{outdeg}(#1)} \newcommand{\ind}{⫫} \newcommand{\dsep}{\!\perp_d\!} \newcommand{\msep}{\!\perp_m\!}$$

In this lecture, we discuss the classes of graphs that are equivalent under d-separation. We discuss how can two graphs be efficiently tested for equivalence, and how can we represent the class of all equivalent graphs.

## 1 Markov Equivalence

Under the semantics of d-separation, an acyclic directed graph can be seen as a compact representation of an independence relation. As we discussed previously, not every independence relation can be represented in such a way (because d-separation satisfy properties that not every distribution satisfies). Pearl advocated independence (and their corresponding graphical representation) as a more fundamental knowledge than probability. In this view, independences may exist without any reference to a probability function. Indeed, the d-separation semantics and its variants can be used to represent independences in uncertainty frameworks such as Dempster-Shafer theory Shenoy93 and credal networks (interval-valued probability models) Cozman2000. Notably, an independence relation might admit many graphical representations. This is clear from the definition of d-separation in serial and divergent connections. The following discussion aims at characterizing the classes of equivalent structures (those that represent the same independence relation), and studying their consequences on the representation of independence as an isolated concept.

Two acyclic directed graphs $$G$$ and $$H$$ are equivalent if their node set is the same, and $(\mathcal{X} \dsep \mathcal{Y} \mid \mathcal{Z})_G \text{ if and only if } (\mathcal{X} \dsep \mathcal{Y} \mid \mathcal{Z})_H$ for every subset of nodes $$\mathcal{X}$$, $$\mathcal{Y}$$ and $$\mathcal{Z}$$.

Equivalence is also called I-equivalence (from independence-equivalence), and Markov equivalence. The latter is due to the fact that the local and global Markov properties are equivalent, hence, two graphs are equivalent if and only if the set of Markov properties of one of the graphs is satisfied by the other one. The three digraphs in Figure 1 below are equivalent (try to see why this is true). Figure 1: Three equivalent graphs and their skeleton with the common v-structure highlighted.

### 1.1 The size of equivalence classes

The number of structure (i.e., acyclic digraphs) over $$n$$ nodes satisfies Robinson's formula Rob1973: $\#\text{DAG}(n) = \sum_{i=1}^n (-1)^{i+1} {n \choose i} 2^{i(n-i)} \#\text{DAG}(n-i) \, .$ This number grows extremely quickly (superexponential). For instance:

n $$\#\text{DAG}(n)$$
1 1
2 3
3 25
$$\vdots$$ $$\vdots$$
8 783 702 329 343
9 1 213 442 454 842 881
10 4 175 098 976 430 598 143

While there is not a known formula for the number of equivalence classes, it is lower bounded by the number of equivalence classes of size 1 (i.e., which have a single structure), which is given by the following formula Ste2003. $\#\text{EQ1}(n) = \sum_{i=1}^n (-1)^{i+1} {n \choose i} (2^{n-i}-(n-i))^i \#\text{EQ1}(n-i) \, .$ The term $$(2^{n-i}-(n-i))^i$$ approaches $$2^{i(n-i)}$$ as $$n \rightarrow \infty$$, thus placing this sequence in the same order of the count of acyclic digraphs. Gillispse and Erlman have computed the number of equivalence classes and their relative size for up to $$n=10$$ and found that GP2001:

$$n$$ $$\#\text{EQUIV}(n)$$ $$\#\text{EQUIV}(n)/\#\text{DAG}(n)$$
1 1 1.0
2 2 0.667
3 11 0.440
$$\vdots$$ $$\vdots$$ $$\vdots$$
8 212 133 402 500 0.271
9 326 266 056 291 213 0.269
10 1 118 902 054 495 975 141 0.268

This number should suffice to convince anyone that any simplistic approach (such as listing DAGs or equivalence classes) to structure learning is infeasible even for small domains.

## 2 Skeleton

Consider a graph $$G$$ with an arc $$X \rightarrow Y$$. Any equivalent graph $$H$$ must have either the same arc or a reversed arc $$Y \rightarrow X$$, since two variables connected by an arc are d-connected (by some set), and any two variables not connected by an arc can be d-separated by some set of variables (possibly the empty set). This implies that any two equivalent graphs must have the same underlying undirected structure. This is formalized by the concept of a skeleton:

The skeleton of a directed graph $$G=(V,E)$$ is the undirected graph $$H=(V,F)$$ such that $$X \rightarrow Y \in E$$ if and only if $$X-Y \in F$$.

## 3 Immorality

It follows from our previous discussion that candidate equivalent graphs must have the same skeleton. For example, the rightmost graph in Figure 1 is the skeleton of the directed graphs on its left side. Now consider 3-node graphs $$G$$ and $$H$$. If $$G$$ and $$H$$ are complete, then they encode no d-separations so they must be equivalent. If $$G$$ is a serial connection and $$H$$ is a divergent connection then they also encode the same set of d-separations. Last, if $$G$$ is either a serial or divergent connection and $$H$$ is a convergent connection, then the two graphs have the same skeleton but represent different d-separation relations (and are thus not equivalent). This is formalized by the following concept:

An immorality is a triple of nodes $$X,Y,Z$$ such that $$X \rightarrow Y \leftarrow Z Y$$ and $$X$$ and $$Z$$ are non-adjacent (i.e., there is no arc connecting $$X$$ and $$Z$$).

An immorality is sometimes also called a v-structure (although this name is also used to describe a general convergent connection). By the previous reasoning, two equivalent graphs must have the same set of immoralities lest we can find a triple (one that is a serial or divergent connection in one and a convergent connection in the other) that induces different d-separations. Thus we have that two graphs are equivalent only if they have the same skeleton and the same set of immoralities. Verma and Pearl showed that the converse is also true Verma1990.

Two acyclic directed graphs are equivalent if and only if they have the same skeleton and the same set of immoralities.

The digraphs in Figure 1 all satisfy the theorem above. The arcs of the single common immorality $$B \rightarrow D \leftarrow C$$ appear highlighted in their skeleton. The set of acyclic digraphs can be partitioned into classes of equivalent graphs. According to the previous theorem, each part can be uniquely represented by the skeleton and the list of immoralities. Notice that, as for d-separation, the equivalence of graphs is a graphical property and can be tested efficiently (in linear time in the number of arcs).

## 4 Essential graphs

It is convenient to associate a graph to every class of equivalent structures.

The essential graph of a class $$\mathcal{G}$$ of equivalent acyclic digraphs $$G=(V_G,E_G)$$ is the digraph $$H=(V_G,E_H)$$ where $$E_H = \cup_{G \in \mathcal{G}} E_G$$, that is, $$E_H$$ contains an arc $$X \rightarrow Y$$ if and only if there is a graph $$G$$ in $$\mathcal{G}$$ with that arc.

Since all the graphs in an equivalence class have the same skeleton and set of immoralities, the essential graph is unique.

## 5 Partially directed graphs

A closely related concept is obtained by interpreting the essential graph as a graph that contains both undirected and directed edges. Call any symmetric relation $$X \rightarrow Y$$ iff $$Y \rightarrow X$$ an undirected edge, and any asymmetric relation $$X \rightarrow Y$$ iff $$X \not \rightarrow Y$$ a directed edge. A partially directed graph is a graph that contains directed and undirected edges.

The partially directed graph of an equivalence class $$\mathcal{G}$$ is the (partially) directed graph that has the same skeleton as a graph in $$\mathcal{G}$$ and has a directed arc $$X \rightarrow Y$$ if and only this arc appears in all graphs in the class.

The partially directed graph represents possible orientations of the arcs in an equivalence class. Any orientation of the undirected graphs that is topologically consistent and does not introduce new immoralities leads to a graph that represents the same set of d-separations. A partially directed graph is a chain graph if there is no cycle that uses only directed edges. The partially directed graph of an equivalence class is a chain graph. It is possible to modify d-separation to work directly on chain graphs.

Given the equivalence between the Markov properties encoded by a graph and the factorization of a distribution, the graphs in an equivalence class can be seen as different parametrizations of the same probability distribution.

## 6 Exercises

1. Prove that the following statements are equivalent for two nodes $$X$$ and $$Y$$ in an acyclic directed graph $$G$$:
1. $$X$$ and $$Y$$ are adjacent.
2. there is no set $$\mathcal{Z}\subseteq\mathcal{V}-\{X,Y\}$$ that d-separates $$X$$ and $$Y$$.
3. $$X$$ and $$Y$$ are not d-separated by $$(\an{X} \cup \an{Y})-\{X,Y\}$$.
4. $$X$$ and $$Y$$ are not d-separated by $$(\pa{X} \cup \pa{Y})-\{X,Y\}$$.
2. Discuss why the structure below has not other Markov equivalent structure. Hint: For each edge, find a d-separation relation that would be violated if that edge was reversed. 3. Suppose you have an indepedence oracle $$I$$ which answers independence queries "is $$\mathcal{X} \ind \mathcal{Y} \mid \mathcal{Z}$$?". Discuss how you can use at most 2 calls to the oracle to decide which of the structures below represent the independence relation induced by the oracle. Assume that exactly one of the graphs below represents the independence relation of the oracle. # Bibliography

• [Shenoy93] Shenoy, Valuation networks and conditional independence, 191-199, in in: Proceedings of the Ninth Conference on Uncertainty in Artificial Intelligence, edited by (1993)
• [Cozman2000] Fabio Cozman, Credal networks, Artificial Intelligence, 120, 199-233 (2000).
• [Rob1973] Robinson, Counting labeled acyclic digraphs, New Directions in the Theory of Graphs, 239-273 (1973).
• [Ste2003] Steinsky, Enumeration of labeled chain graphs and labeled essential directed acyclic graphs, Discrete Mathematics, 270, 267-278 (2003).
• [GP2001] Gillispie & Perlman, Enumerating markov equivalence classes of acyclic digraph models, 171-177, in in: Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, edited by (2001)
• [Verma1990] Verma & Pearl, Equivalence and synthesis of causal models, 220-227, in in: Proceedings of the Sixth Conference on Uncertainty in Artifical Intelligence, edited by (1990)

Created: 2018-10-10 Wed 19:49

Validate