# Equivalence in Bayesian Networks

## Table of Contents

\( \newcommand{\nb}[1]{\text{ne}(#1)} \newcommand{\pa}[1]{\text{pa}(#1)} \newcommand{\ch}[1]{\text{ch}(#1)} \newcommand{\de}[1]{\text{de}(#1)} \newcommand{\an}[1]{\text{an}(#1)} \newcommand{\nd}[1]{\text{nd}(#1)} \newcommand{\indeg}[1]{\text{indeg}(#1)} \newcommand{\outdeg}[1]{\text{outdeg}(#1)} \newcommand{\ind}{⫫} \newcommand{\dsep}{\!\perp_d\!} \newcommand{\msep}{\!\perp_m\!} \)

In this lecture, we discuss the classes of graphs that are equivalent under d-separation. We discuss how can two graphs be efficiently tested for equivalence, and how can we represent the class of all equivalent graphs.

## 1 Markov Equivalence

Under the semantics of d-separation, an acyclic directed graph can be seen as a compact representation of an independence relation. As we discussed previously, not every independence relation can be represented in such a way (because d-separation satisfy properties that not every distribution satisfies). Pearl advocated independence (and their corresponding graphical representation) as a more fundamental knowledge than probability. In this view, independences may exist without any reference to a probability function. Indeed, the d-separation semantics and its variants can be used to represent independences in uncertainty frameworks such as Dempster-Shafer theory Shenoy93 and credal networks (interval-valued probability models) Cozman2000. Notably, an independence relation might admit many graphical representations. This is clear from the definition of d-separation in serial and divergent connections. The following discussion aims at characterizing the classes of equivalent structures (those that represent the same independence relation), and studying their consequences on the representation of independence as an isolated concept.

Two acyclic directed graphs \(G\) and \(H\) are **equivalent** if
their node set is the same, and \[ (\mathcal{X} \dsep
\mathcal{Y} \mid \mathcal{Z})_G \text{ if and only if }
(\mathcal{X} \dsep \mathcal{Y} \mid \mathcal{Z})_H \] for
every subset of nodes \(\mathcal{X}\), \(\mathcal{Y}\) and
\(\mathcal{Z}\).

Equivalence is also called **I-equivalence** (from
independence-equivalence), and **Markov equivalence**. The
latter is due to the fact that the local and global Markov
properties are equivalent, hence, two graphs are equivalent if
and only if the set of Markov properties of one of the graphs
is satisfied by the other one. The three digraphs in Figure
1 below are equivalent (try to see why this is true).

Figure 1: Three equivalent graphs and their skeleton with the common v-structure highlighted.

### 1.1 The size of equivalence classes

The number of structure (i.e., acyclic digraphs) over \(n\) nodes satisfies Robinson's formula Rob1973: \[ \#\text{DAG}(n) = \sum_{i=1}^n (-1)^{i+1} {n \choose i} 2^{i(n-i)} \#\text{DAG}(n-i) \, . \] This number grows extremely quickly (superexponential). For instance:

n | \(\#\text{DAG}(n)\) |
---|---|

1 | 1 |

2 | 3 |

3 | 25 |

\(\vdots\) | \(\vdots\) |

8 | 783 702 329 343 |

9 | 1 213 442 454 842 881 |

10 | 4 175 098 976 430 598 143 |

While there is not a known formula for the number of equivalence classes, it is lower bounded by the number of equivalence classes of size 1 (i.e., which have a single structure), which is given by the following formula Ste2003. \[ \#\text{EQ1}(n) = \sum_{i=1}^n (-1)^{i+1} {n \choose i} (2^{n-i}-(n-i))^i \#\text{EQ1}(n-i) \, . \] The term \((2^{n-i}-(n-i))^i\) approaches \(2^{i(n-i)}\) as \(n \rightarrow \infty\), thus placing this sequence in the same order of the count of acyclic digraphs. Gillispse and Erlman have computed the number of equivalence classes and their relative size for up to \(n=10\) and found that GP2001:

\(n\) | \(\#\text{EQUIV}(n)\) | \(\#\text{EQUIV}(n)/\#\text{DAG}(n)\) |
---|---|---|

1 | 1 | 1.0 |

2 | 2 | 0.667 |

3 | 11 | 0.440 |

\(\vdots\) | \(\vdots\) | \(\vdots\) |

8 | 212 133 402 500 | 0.271 |

9 | 326 266 056 291 213 | 0.269 |

10 | 1 118 902 054 495 975 141 | 0.268 |

This number should suffice to convince anyone that any simplistic approach (such as listing DAGs or equivalence classes) to structure learning is infeasible even for small domains.

## 2 Skeleton

Consider a graph \(G\) with an arc \(X \rightarrow Y\). Any equivalent graph \(H\) must have either the same arc or a reversed arc \(Y \rightarrow X\), since two variables connected by an arc are d-connected (by some set), and any two variables not connected by an arc can be d-separated by some set of variables (possibly the empty set). This implies that any two equivalent graphs must have the same underlying undirected structure. This is formalized by the concept of a skeleton:

The **skeleton** of a directed
graph \(G=(V,E)\) is the undirected graph \(H=(V,F)\) such that
\(X \rightarrow Y \in E\) if and only if \(X-Y \in F\).

## 3 Immorality

It follows from our previous discussion that candidate equivalent graphs must have the same skeleton. For example, the rightmost graph in Figure 1 is the skeleton of the directed graphs on its left side. Now consider 3-node graphs \(G\) and \(H\). If \(G\) and \(H\) are complete, then they encode no d-separations so they must be equivalent. If \(G\) is a serial connection and \(H\) is a divergent connection then they also encode the same set of d-separations. Last, if \(G\) is either a serial or divergent connection and \(H\) is a convergent connection, then the two graphs have the same skeleton but represent different d-separation relations (and are thus not equivalent). This is formalized by the following concept:

An **immorality** is a triple of nodes \(X,Y,Z\) such that \(X
\rightarrow Y \leftarrow Z Y\) and \(X\) and \(Z\) are non-adjacent
(i.e., there is no arc connecting \(X\) and \(Z\)).

An immorality is sometimes also called a **v-structure**
(although this name is also used to describe a general
convergent connection). By the previous reasoning, two
equivalent graphs must have the same set of immoralities lest
we can find a triple (one that is a serial or divergent
connection in one and a convergent connection in the other)
that induces different d-separations. Thus we have that two
graphs are equivalent only if they have the same skeleton and
the same set of immoralities. Verma and Pearl showed that the
converse is also true Verma1990.

Two acyclic directed graphs are equivalent if and only if they have the same skeleton and the same set of immoralities.

The digraphs in Figure 1 all satisfy the theorem above. The arcs of the single common immorality \(B \rightarrow D \leftarrow C\) appear highlighted in their skeleton. The set of acyclic digraphs can be partitioned into classes of equivalent graphs. According to the previous theorem, each part can be uniquely represented by the skeleton and the list of immoralities. Notice that, as for d-separation, the equivalence of graphs is a graphical property and can be tested efficiently (in linear time in the number of arcs).

## 4 Essential graphs

It is convenient to associate a graph to every class of equivalent structures.

The **essential graph** of a class \(\mathcal{G}\) of equivalent
acyclic digraphs \(G=(V_G,E_G)\) is the digraph \(H=(V_G,E_H)\)
where \(E_H = \cup_{G \in \mathcal{G}} E_G\), that is, \(E_H\)
contains an arc \(X \rightarrow Y\) if and only if there is a
graph \(G\) in \(\mathcal{G}\) with that arc.

Since all the graphs in an equivalence class have the same skeleton and set of immoralities, the essential graph is unique.

## 5 Partially directed graphs

A closely related concept is obtained by interpreting the
essential graph as a graph that contains both undirected and
directed edges. Call any symmetric relation \(X \rightarrow Y\)
iff \(Y \rightarrow X\) an **undirected edge**, and any asymmetric
relation \(X \rightarrow Y\) iff \(X \not \rightarrow Y\) a
**directed edge**. A **partially directed graph** is a graph that
contains directed and undirected edges.

The **partially directed graph** of an equivalence class
\(\mathcal{G}\) is the (partially) directed graph that has the
same skeleton as a graph in \(\mathcal{G}\) and has a directed
arc \(X \rightarrow Y\) if and only this arc appears in all
graphs in the class.

The partially directed graph represents possible orientations
of the arcs in an equivalence class. Any orientation of the
undirected graphs that is topologically consistent and does
not introduce new immoralities leads to a graph that
represents the same set of d-separations. A partially directed
graph is a **chain graph** if there is no cycle that uses only
directed edges. The partially directed graph of an equivalence
class is a chain graph. It is possible to modify d-separation
to work directly on chain graphs.

Given the equivalence between the Markov properties encoded by
a graph and the factorization of a distribution, the graphs in
an equivalence class can be seen as different
**parametrizations** of the same probability distribution.

## 6 Exercises

- Prove that the following statements are equivalent for two
nodes \(X\) and \(Y\) in an acyclic directed graph \(G\):
- \(X\) and \(Y\) are adjacent.
- there is no set \(\mathcal{Z}\subseteq\mathcal{V}-\{X,Y\}\) that d-separates \(X\) and \(Y\).
- \(X\) and \(Y\) are
*not*d-separated by \((\an{X} \cup \an{Y})-\{X,Y\}\). - \(X\) and \(Y\) are
*not*d-separated by \((\pa{X} \cup \pa{Y})-\{X,Y\}\).

Discuss why the structure below has not other Markov equivalent structure.

*Hint:*For each edge, find a d-separation relation that would be violated if that edge was reversed.Suppose you have an

*indepedence oracle*\(I\) which answers independence queries "is \(\mathcal{X} \ind \mathcal{Y} \mid \mathcal{Z}\)?". Discuss how you can use at most 2 calls to the oracle to decide which of the structures below represent the independence relation induced by the oracle. Assume that exactly one of the graphs below represents the independence relation of the oracle.

# Bibliography

- [Shenoy93] Shenoy, Valuation networks and conditional independence, 191-199, in in: Proceedings of the Ninth Conference on Uncertainty in Artificial Intelligence, edited by (1993)
- [Cozman2000] Fabio Cozman, Credal networks,
*Artificial Intelligence*,**120**, 199-233 (2000). - [Rob1973] Robinson, Counting labeled acyclic digraphs,
*New Directions in the Theory of Graphs*, 239-273 (1973). - [Ste2003] Steinsky, Enumeration of labeled chain graphs and labeled essential directed acyclic graphs,
*Discrete Mathematics*,**270**, 267-278 (2003). - [GP2001] Gillispie & Perlman, Enumerating markov equivalence classes of acyclic digraph models, 171-177, in in: Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, edited by (2001)
- [Verma1990] Verma & Pearl, Equivalence and synthesis of causal models, 220-227, in in: Proceedings of the Sixth Conference on Uncertainty in Artifical Intelligence, edited by (1990)