Variable Elimination

Table of Contents

\( \newcommand{\nb}[1]{\text{ne}(#1)} \newcommand{\pa}[1]{\text{pa}(#1)} \newcommand{\ch}[1]{\text{ch}(#1)} \newcommand{\de}[1]{\text{de}(#1)} \newcommand{\an}[1]{\text{an}(#1)} \newcommand{\nd}[1]{\text{nd}(#1)} \newcommand{\indeg}[1]{\text{indeg}(#1)} \newcommand{\outdeg}[1]{\text{outdeg}(#1)} \newcommand{\ind}{⫫} \newcommand{\dsep}{\!\perp_d\!} \newcommand{\msep}{\!\perp_m\!} \newcommand{\sep}{\!\perp_u\!} \newcommand{\sco}[1]{\text{sc}(#1)} \newcommand{\del}[2]{{#2}^{-#1}} \)

We resume our study on algorithms for computing the probability of evidence (or the partition function) of a Bayesian network or Markov network. We discuss Variable Elimination, a generic framework for producing exact inference in probabilistic graphical models.

1 The Sum-Product Problem

Here we will consider a slightly more general version of the problem, stated as follows.

Sum-Product Problem: Given potentials \(\phi_1,\dotsc,\phi_m\) over a set of variables \(\mathcal{V}\) and a set \(\mathcal{X} \subseteq \mathcal{V}\), compute

\begin{equation*} \phi(\mathcal{X}) = \sum_{\nu \sim \mathcal{V}-\mathcal{X}} \prod_j \phi_j(\nu[\mathcal{C}_j]) \, . \end{equation*}

For even moderately sized sets \(\mathcal{V}\), the above expression is impractical to compute by applying the operations as they appear. We have seen that this problem can be solved by sampling algorithms, but they do not provide any guarantees with finite time. In this lecture, we will analyze one of the most popular exact algorithm for that problem, called variable elimination.

For example, the problem of computing the probability of some evidence \(p(E_1=e_1,\dotsc,E_m=e_m)\) in Bayesian networks can be posed as a sum-product problem with potentials \(p(X|\pa{X})\) for each \(X \in \mathcal{V}\), and \(\delta_{e_i}(E_i)\) for each evidence \(E_i=e_i\), where \(\delta_{e_i}(E_i)=1\) of \(E_i=e_i\) and \(\delta_{e_i}(E_i)=0\) if \(E_i \neq e_i\). Then, \[ p(e_1,\dotsc,e_m) = \sum_{\nu \sim \mathcal{V}} \prod_X p(\nu[X]|\nu[\pa{X}]) \prod_{E_i} \delta_{e_i}(\nu[E_i]) \, . \]

2 Historical Account

Variable elimination is a simple and intuitive algorithm for manipulating information that seems to have been re-developed many times in different areas such as operational research, constraint satisfaction, relational databases, and probabilistic inference, under many different names (and subtle variations). For instance, it was used for automating pedigree analysis in genetics CTS1976, and for computing conjunctive queries in relational databases BFMY1983; it was discussed as a sort of non-sequential dynamic programming paradigm in the operational research field BB1972; it underlies the basic version of the Davis-Puttnam algorithm for solving satisfiability DP1960; it was one of the first practical exact algorithms for computing marginals (i.e., belief updating) in Bayesian networks SDD1990,ZP1994. Dechter was one of the first persons to note the similarity of all these methods, and to formalize them as a unifying algorithm (with a few minor modifications and under the name of bucket elimination) Dec1990. The algorithm's key insight is to exploit the distributivity of addition (marginalization) over multiplication (product) to decompose the problem into smaller problems in a dynamic programming fashion.

3 An algebra of potentials

Variable elimination is a generic framework which can be used to solve many computational problems that consist in aggregating pieces of information or knowledge and projecting into a new domain (this include problems that are conceptually unrelated to probabilistic inference such as consulting a relational database). Before presenting variable elimination over the concrete objects that we manipulate (real-valued potentials), we first analyze the algebraic properties of potentials at an abstract level. To this end, let \(\mathcal{V}\) be a finite set of variables (with their respective domains).

3.1 Abstract potentials

A labeled abstract potential is an object \(\phi\) with an associated \define{scope} \(\mathcal{S} \subseteq \mathcal{V}\). We denote by \(\sco{\phi}=\mathcal{S}\) the function that maps labeled abstract potentials to their scopes, and often write a labeled abstract potential as \((\phi,\mathcal{S})\).

A valuation is a mapping from random variables in \(\mathcal{V}\) to values in their co-domain. An example of concrete (i.e., non-abstract) potential are the probability potentials that map valuations of sets of random variables to nonnegative real values. The scope of a probability potential \(\phi\) is a set \(\mathcal{S}\) of random variables such that \(\phi(\nu)=\phi(\nu')\) for any two valuations \(\nu\) and \(\nu'\) that agree on \(\mathcal{S}\), that is, \(\phi\) is not changed if we change the value of a variable in the valuation outside its scope. Intuitively, the scope is the (not necessarily minimal) set of variables on which the potential depends.

The table below represents a probability potential. Note that the order in which the variables appear is not relevant, as potentials are defined over valuations.

A B \(\phi\)
0 0 10
1 0 0.1
0 1 0.1
1 1 10

For convenience, we will refer to labeled abstract potentials simply as potentials (bearing in mind that these need not to refer to the concrete potentials we have just defined). Let \(\Phi\) be the set of all abstract potentials.

3.2 Combination

A combination is a binary operation \(\times\) on \(\Phi\) satisfying \(\sco{\phi \times \psi}=\sco{\phi} \cup \sco{\psi}\).

In our concrete potentials, combination corresponds to product of functions. That is, \((\phi \times \psi)(\nu) = \phi(\nu) \cdot \psi(\nu)\).

Consider the following probability potentials:

A B \(\phi_1\)   A C \(\phi_2\)
0 0 10   0 0 5
1 0 0.1   1 0 0.2
0 1 0.1   0 1 5
1 1 10   1 1 0.2

Their combination, taken as their product, is

A B C \(\phi_1 \times \phi_2\)
0 0 0 50
1 0 0 0.02
0 1 0 0.5
1 1 0 2
0 0 1 50
1 0 1 0.02
0 1 1 0.5
1 1 1 2

3.3 Elimination

An elimination is a mapping \(\del{\phi}{X}\) from \(\Phi \times \mathcal{V}\) to \(\Phi\) such that \(\sco{\del{X}{\phi}}=\sco{\phi}-\{X\}\).

Elimination of probability potentials is usually defined as the sum-marginalization of the first operand from the second one: \(\del{X}{\phi}(\nu) = \sum_{\nu': \nu'(X)=\nu(X), X \in \sco{\phi}-\{X\}} \phi(\nu)\).

Consider the following probability potential

A B \(\phi\)
0 0 10
1 0 0.1
0 1 5
1 1 0.2

We have:

B \(\del{A}{\phi}\)   A \(\del{B}{\phi}\)
0 10.1   0 15
1 5.2   1 0.3

3.4 Algebra of potentials

An algebra of (labeled abstract) potentials is a system \((\mathcal{V},\Phi,\times,\del{\cdot}{\cdot})\) closed under combination and elimination, and satisfying the following axioms SS1988,Koh2003:

\begin{align*} %\label{A1} \tag{A1} & \phi \times \psi = \psi \times \phi & \text{[commutativity of combination]}\\ %\label{A2} \tag{A2} & \phi_1 \times (\phi_2 \times \phi_3) = (\phi_1 \times \phi_2) \times \phi_3 & \text{[associativity of combination]}\\ %\label{A3} \tag{A3} & \del{Y}{\del{X}{\phi}}=\del{X}{\del{Y}{\phi}} & \text{[transitivity of elimination]}\\ %\label{A4} \tag{A4} & X \not\in \sco{\phi} \implies \del{X}{(\phi \times \psi)}=\phi \times (\del{X}{\psi}) &\text{[distributivity of $-$ over $\times$]} \end{align*}

Let \(\phi_1,\dotsc,\phi_n\) be potentials and \(X\) be a variable such that \(X \not\in \sco{\phi_1} \cup \dotsb \cup \sco{\phi_m}\), for some \(m < n\). These axioms allows us to derive:

\begin{equation*} \del{X}{(\phi_1 \times \dotsb \times \phi_n)} = \phi_1 \times \dotsb \times \phi_m \times \del{X}{(\phi_{m+1} \times \dotsb \times \phi_n)} \end{equation*}

Repeated application of the result above suffices to derive a sound algorithm for the elimination of a set of variables from a combination of potentials. Our interest in the algebra of potentials is to compute inference in discrete probabilistic graphical models (Bayesian networks and Markov networks). Nevertheless, many other representation devices can be shown to satisfy these axioms, and hence, variable elimination is well-defined on them. Examples include Gaussian distributions, systems of linear equations, convex sets, Boolean algebra, and belief functions Koh2003. Notably, product and maximization of discrete functions (i.e., potentials) satisfy the axioms; we can thus use variable elimination to find maximum probability configurations.

4 Variable Elimination

The Variable Elimination Algorithm is a procedure for computing the elimination of variables from the combination of a number of potentials. The algorithm takes a set

\begin{equation*} \Psi = \{\phi_1,\dotsc,\phi_m\} \end{equation*}

of potentials, and an ordered set of variables \((X_1,\dotsc,X_n)\) to be eliminated, and returns

\begin{equation*} \del{X_n}{\del{\dotsb}{\del{X_1}{(\phi_1 \times \dotsb \times \phi_m)}}} \, . \end{equation*}

It proceeds as follows

  1. For \(i\gets 1\) to \(i\gets n\) do:
    1. Collect all potentials \(\phi \in \Psi\) such that \(X_i \in \sco{\phi}\) and store them in \(\Phi_i\)
    2. Combine all potential \(\phi \in \Phi_i\) and then eliminate \(X\) to obtain \(\psi_i\)
    3. Remove the potentials \(\phi \in \Phi_i\) from \(\Psi\) and add potential \(\psi_i\)
  2. Return the combination of all \(\phi \in \Psi\)

4.1 Performance analysis

Let us analyze the time complexity of the algorithm. A naive implementation of Step 1.1 requires considering each potential in \(\Psi\), and performing a linear search over its scope. Let \(\omega\) be the maximum cardinality of a scope in \(\Psi\) (at any time). Then assuming that insertion in \(\Phi_i\) takes constant time, this step takes \(\mathcal{O}(m \cdot \omega)\) time (since the size of \(\Psi\) never increases at each iteration). Consider now Step 1.2). Combining two potentials \(\phi\) and \(\psi\) can be done by considering all partial valuations defined over the variables in \(\sco{\phi\psi}\) and for each valuation performing the product \(\phi(\nu)\psi(\nu)\). Let \(c\) be the maximum cardinality of a variable; assuming multiplication is performed in constant time, combination takes \(\mathcal{O}(c^k)\) time where \(k=|\sco{\phi} \cup \sco{\psi}|\). Elimination can be performed similarly with the same asymptotic cost. Hence, Step 1.2 takes time \(\mathcal{O}(m_i \cdot c^{\omega_i+1})\), where \(m_i=|\Phi_i|\) and \(\omega_i=|\sco{\psi_i}|\). Assuming removal from \(\Phi_i\) takes constant time, Step 1.3 can be performed in time \(\mathcal{O}(m_i)\). We also need to account for the combination of potentials in the last step and the size of the output. For simplicity, we will assume that this step takes constant time (as it is often the case in practice), which implies that all but a few variables are eliminated. The overall running time is thus \[ \mathcal{O}\left(m\omega + \sum_{i=1}^n m_i c^{\omega_i} \right) = \mathcal{O}\bigr( (m+n)c^{\omega+1} \bigr) \, , \] where we used the fact that sum of all \(m_i\) is at most \(m\) (since at each iteration we remove \(m_i\) potentials and add one, so that \(m_n \leq m - \sum_{i=1}^{n-1} [m_i + 1]\)), and \(\omega=\max_i \omega_i\).

Note that while \(m,n\) and \(c\) are parameters of the model, the value \(\omega\) is not. Moreover, \(\omega\) depends on the variable elimination ordering. To see this, consider binary variables \(A,B,C\) and potentials \(\phi_1(A,B)\) and \(\phi_2(B,C)\). If we perform variable elimination using the ordering \(A,B\) we end up with \[ \psi_1(B) = \phi_1(A,B) - A\,, \quad \psi_2(C) = (\phi_2(B,C)\psi_1(C))-B \, , \] and obtain that \(\omega=1\). On the other hand, if we use the ordering \(B,A\), the algorithm generates \[ \psi_1'(A,C) = \phi_1(A,B)\phi_2(B,C) - B\,, \quad \psi_2'(C) = \psi_1(A,C)-A \, , \] and thus \(\omega=2\). Can we obtain a bound on \(\omega\) as a function of the model (the potentials and variables) and the elimination ordering? Not surprisingly, we will represent the model graphically and determine \(\omega\) as a function of the graph.

4.2 Domain Graph

The domain graph of a set of potentials \(\Psi\) is an undirected graph whose nodes are the variables appearing in the scope of a potential in \(\Psi\) and an edge connects two nodes if and only if the corresponding variables appear in the same scope.

The figure below depicts the domain graph for the potentials \(\phi(A,B)\), \(\phi(A,C)\), \(\phi(B,D)\), \(\phi(C,D)\), \(\phi(D,E)\).

domaingraph.png

The reader should convince herself that the domain graph of a Markov network is its own graph, and that the domain of a Bayesian network is its moral graph.

For the rest of this discussion, we will assume that the elimination ordering is complete, that is, that it contains all the variables appearing in a potential (the following results are still valid if we elimination all but one variable, and can be adapted to the general case to account for the complexity of the output). Note that when the elimination ordering is complete, the output is a number (instead of a potential).

4.3 Elimination Graph

Let \(D_0,D_1,\dotsc,D_n\) be the domain graphs of the sets \(\Psi\) obtained during the execution of variable elimination. The elimination graph is the undirected graph over the nodes of \(D_0\) such that there is an edge if and only if that edge appears in at least one of \(D_0,D_1,\dotsc,D_n\).

The elimination graph for potentials \(\phi(A,B)\), \(\phi(A,C)\), \(\phi(B,D)\), \(\phi(C,D)\), \(\phi(D,E)\) and variable ordering \(A,B,C,D,E\) is shown below.

elimgraph.png

The elimination graph depends on the elimination ordering used.

Consider again the potentials \(\phi_1(A,B)\) and \(\phi_2(B,C)\). Their domain graph is the leftmost graph \(D\) in the figure below.

domaingraphs.png

After eliminating variable \(A\), we are left with potentials \(\psi_1(B)\) and \(\phi_2(B,C)\), whose domain graph is \(D^{-A}\) (ignoring blue nodes and dashed edges). Eliminating \(B\) then generates the graph \(D^{-A,B}\). The elimination graph of this run is the graph \(E^{A,B}\), which equals the initial domain graph.

Now consider a new run of the algorithm where we first eliminate \(B\), obtaining \(\psi_1'(A,C)\). The respective domain graph is \(D^{-B}\). Note that an edge between \(A\) and \(C\) was inserted, representing the potential \(\psi_1'\). Eliminating variable \(A\) generates the graph \(D^{-B,A}\) in the figure. The rightmost graph \(E^{-B,A}\) is the elimination graph of this ordering; we can see that this graph is more connected than the initial domain graph.

It is not a coincidence that in our example the value of \(\omega\) is the size of the largest maximal clique in the elimination graph of each run minus one.

Consider a graph \(G\) and a node \(X\). The elimination of node \(X\) from \(G\) is the graph \(G^{-X}\) obtained by pairwise connecting the neighbors of \(X\) and removing \(X\) and its incident edges.

For example, the graph \(D^{-A}\) in the example is obtained by eliminating variable \(B\) from the (initial) domain graph \(D\). As the following result shows, elimination of variables can be represented graphically by elimination of nodes in the corresponding domain graph.

The following theorem shows that elimination of variables can be represented graphically by elimination of nodes.

The domain graph \(D_i\) of the potentials in the \(i\)th iteration of Variable Elimination equals the graph obtained by eliminating node \(X_i\) from the domain graph \(D_{i-1}\) of the \((i-1)\)th iteration.

Let \(\Psi_i\) denote the set \(\Psi\) after the $i$th iteration. Then, \(\Psi_i = \Psi_{i-1} - \Phi_i \cup \{\psi_i\}\). By construction, \(X_i\) is not in \(D_i\). First note that the neighbors of \(X_i\) in \(D_{i-1}\) are \(\bigcup_{\phi \in \Phi_i} \sco{\phi}\), and this set is the scope of \(\psi_i\). Since \(\psi_i\) is \(\Psi_i\), the neighbors of \(X_i\) must be pairwise connected in \(D_i\). It remains to show that no other edge is inserted or removed. Let \(Y\) and \(Z\) be adjacent variables in \(D_{i-1}\) which are not neighbors of \(X_i\). By definition of domain graph, there must be a potential in \(\Psi_{i-1}\) whose scope contains both \(Y\) and \(Z\) and does not contain \(X\). By construction, that potential is also in \(\Psi_i\) (because it is not in \(\Phi_i\)), hence \(Y\) and \(Z\) are adjacent in \(D_i\). Thus, every edge in \(D_i\) which is not incident in \(X_i\) or in one of its neighbors is also in \(D_{i+1}\). Now consider an edge \(Y,Z\) in \(D_i\). Then either \(Y\) and \(Z\) appear in \(\psi_i\) are neighbors of \(X_i\) or else they appear in some potential which is also in \(\Phi_i\).

That result extends to the elimination graph (and justifies its name).

The elimination graph equals the graph obtained as follows: Start with \(D_0\) and for \(i=1,\dotsc,n\) connect any two neighbors \(X_j\) and \(X_k\) of \(X_i\) such that \(i < j < k\).

Thus, the elimination graph is obtained by simulating the elimination of each node sequentially, but with marking instead of removing of the eliminated nodes.

It follows from the previous theorem, by noting that the neighbors of \(X_i\) in \(D_{i-1}\) are the neighbors of \(X_i\) in \(D\) that have not been eliminated, hence are some variable \(X_j\) with \(j < i\).

4.4 Induced width

We parametrize the class of elimination graphs by the size of the largest clique.

The induced width of an elimination ordering is the the size of the largest clique of the corresponding elimination graph minus one.

We can now relate the size of generated potentials with the elimination graph.

The induced width of an elimination order equals \(\omega=\max_i |\sco{\psi_i}|\).

First note that the scopes of the initial potentials are cliques in the initial domain graphs and hence cliques in the elimination graph. The \(\sco{\psi_i} \cup \{X_i\}\) is also a clique in the elimination graph, since \(\nb{X_i}\) is a clique in \(D_i\), \(X_i\) is connected to them in \(D_{i-1}\). To see that these cliques are maximal, note that an edge occurs only between variables in the scope of a common potential, and that the scope of any other potential generated is a subset of \(\sco{\psi_i} \cup \{X_i\}\).

4.5 Complexity of Variable Elimination

We can now state the connection between the graphical representation of the model and the complexity of variable elimination.

The value \(\omega_i\) is the size of the neighborhood of \(X_i\) in \(D_{i-1}\), which is also the number of higher-ordered neighbors in the elimination graph.

To see that \(\omega_i\) is the number of neighbors of \(X_i\) in \(D_{i-1}\), note that the scope of \(\psi_i\) contains exactly the variables which co-appear with \(X_i\) in the scope of a potential (hence are neighbors of \(X_i\) in \(D_{i-1}\)).

We are finally ready to state the complexity of variable elimination with respect to the graphical representation of its input. Let \(b\) be the smallest cardinality of a variable.

Variable elimination takes time \(\Omega(m b^\chi)\) and \(\mathcal{O}((m+n)c^{\omega+1})\), where \(\chi\) is the size of the largest clique in the domain graph, and \(\omega\) is the induced-width of the elimination ordering plus one.

5 Finding Elimination Orderings

According to the previous theorem, the complexity of the algorithm is lower bounded by the biggest clique size in the domain graph and upper bounded by the biggest clique size in the elimination graph. The former is part of the input, and cannot be improved. As we have seen by example, the latter depends on the elimination ordering. Thus, we want to find the elimination ordering that minimizes the induced width. This is summarized by the concept of a graph's treewidth.

The treewidth of a graph \(G\) is the minimum induced width of an elimination ordering.

Thus, the optimum implementation of variable elimination is to select an elimination ordering whose induced width equals the treewidth. Unfortunately, finding such an ordering is an NP-hard problem (verifying if there is an ordering whose induced width is at most a certain integer is NP-complete) Acp1987. Instead, we will devise efficient heuristics for finding good elimination orderings (elimination orderings with relatively small induced width).

5.1 Chordalization

There is an interesting connection between elimination orderings and chordalization of graphs.

A chordal graph \(H\) is a chordalization of a graph \(G\) if every edge in \(G\) is also in \(H\).

The following result shows that an elimination can be seen as a chordalization of the (initial) domain graph.1

The elimination graph is chordal.

We prove by contradiction. Take a chordless cycle \(X_{i_1},\dotsc,X_{i_k}\), ordered according to the elimination ordering (this is always possible, since a cycle is closed under shifting). Eliminating \(X_{i_1}\) generates an edge between \(X_{i_2}\) and \(X_{i_k}\) in the elimination graph, and hence a chord in the cycle.

Hence, variable elimination can be seen as a means of inserting edges in the domain graph so as to make it chordal. For instance, the chordal graph in right below is the elimination graph of the non-chordal domain graph on the left, obtained by eliminating nodes in the ordering \(A,B,C,D,E\).

domaingraph2.png

5.2 Fill-in edge

The edges in the elimination graph which are not in the domain graph are called fill-in edges. Each fill-in edge denotes the increase of the scope of a potential with respect to the scopes in the input potentials, which implies an exponential increase in the time complexity of the algorithm. The edge \(B -- C\) in the graph on the right above is the single fill-in edge.

5.3 Min-Fill Heuristic

It can be shown that every chordalization corresponds to an elimination. Hence, the treewidth of a graph can be equivalently stated as the minimal chordalization of a graph (where minimality is taken with respect to the number of edges). Consequently, the complexity of variable elimination is minimized by finding the smallest chordalization of the domain graph (where the size of a chordalization is the number of fill-in edges). This suggest the min-fill heuristic, which starts with the domain graph and then recursively selects the node whose elimination leads to the minimum number of fill-in edges and eliminates that node. Let \(\text{fill}(G-X)\) denote the number of fill-in edges introduced by eliminating \(X\) from \(G\) (i.e., the number of edges needed to turn \(\nb{X}\) into a clique). The min-fill heuristic is:

  1. Start with the domain graph \(D\) and an empty list \(L\)
  2. For \(i\gets 0\) to \(i \gets n\) do:
    1. Select the node \(X_j\) in \(D\) that minimizes \(\text{fill}(D-X)\)
    2. Add \(X_j\) to the end of \(L\) and update \(D \gets D - X_j\)
  3. Return \(L\)

The algorithm takes time linear in the number of nodes and quadratic in the largest clique size if the graph is represented as an adjacency list (due to computing the number of fill-in edges, which requires checking whether neighbors are connected).

5.4 Simplicial nodes

While the min-fill heuristic may seem intuitively a good approach, it does not always lead to the minimum width elimination ordering. It thus so when the domain graph is already chordal.

A node \(X\) is simplicial if \(\nb{X}\) is a clique.

Clearly, a node \(X\) is simplicial iff \(\text{fill}(D-X)=0\). Hence, min-fill will always prefer a simplicial node over some non simplicial nodel.

5.5 Perfect Elimination Orderings

An elimination ordering is perfect if it introduces no fill-in edges.

The following statements are true.

  1. Any chordal graph with at least two nodes has at least two simplicial nodes.
  2. A graph is chordal if and only if it admits a perfect elimination ordering.
  3. If \(D\) is chordal and \(X\) is simplicial then \(D-X\) is chordal.
  4. A elimination ordering which repeatedly eliminate simplicial nodes is perfect.

According to the previous theorem, the min-fill heuristic produces a perfect elimination ordering in chordal domain graphs.

5.6 Min-Degree heuristic

The following result is useful to compute the induced width of an elimination ordering (recall that the elimination graph is chordal).

Let \(D_0,D_1,\dotsc,D_n\) be a sequence of graphs obtained by repeatedly eliminating simplicial nodes \(X_1,\dotsc,X_n\). Then the induced width of the ordering is the maximum number of neighbors of a variable \(X_i\) in \(D_{i-1}\).

The size of the potentials \(\psi_i\) created are related to the number of neighbors of the variable \(X_i\) being eliminated. Thus, it is reasonable to eliminate variables of small degree. The min-degree heuristic attempts to find a good elimination ordering by favoring variables with fewer neighbors. The algorithm is very similar to the min-fill heuristics:

  1. Start with the domain graph \(D\) and an empty list \(L\)
  2. For \(i\gets 0\) to \(i \gets n\) do:
    1. Select the node \(X_j\) in \(D\) that minimizes \(|\ne{X}|\)
    2. Add \(X_j\) to the end of \(L\) and update \(D \gets D - X_j\)
  3. Return \(L\)

5.7 Weighted heuristics

Both the min-fill and the min-degree heuristic aim at minimizing the maximum size of the scope of \(\psi_i\). However, the size of the scope also depends on the cardinality of the variables. For instance, it might be better to generate a potential with 10 binary variables than to generate a potential with 2 variables of cardinality 100. This leads us to consider weighted versions of these heuristics. The weighted min-fill heuristic selects a node that has the smallest sum of weights of fill-in edges, where the weight of an edge is the product of the cardinalities of the adjacent variables. The weighted min-degree heuristic selects a node that minimizes the weighted degree, defined as the product of the cardinality of the neighbors.

5.8 Offline computation

Note that the min-fill and the min-degree heuristics can be computed dynamically at each iteration from the corresponding graph \(D_i\). This is usually a good approach when the elimination ordering heuristic is fixed. It might be interesting however to obtain an elimination order before calling variable elimination, by running several heuristics (possibly each multiple times) and select the best ordering.

6 Conclusion

Variable elimination is a simple algorithm for computing sum-product inferences in Bayesian networks and Markov networks. Its complexity is exponential in the induced width of the elimination ordering used. Finding an optimal elimination ordering is computationally difficult and greedy heuristics are used. These heuristics often perform well in practice, generating close to optimal orderings.

Variable elimination is particularly efficient in models of small treewidth. This is the case of tree-shaped Bayesian and Markov networks, and of polytree-shaped Bayesian networks of small in-degree. Tree-shaped models are routinely used to represent sequential data such as speech, text, and movies. For these graphs an optimal elimination ordering can be found efficiently.

7 Exercises

  1. Implement Variable Elimination and heuristics, test on networks (input is UAI format). Use your implementation to evaluate performance of sampling algorithms

Bibliography

  • [CTS1976] Cannings, Thompson & Skolnick, The recursive derivation of likelihoods on complex pedigrees, Advances in Applied Probability, 8(4), 622-625 (1976).
  • [BFMY1983] Beeri, Fagin, Maier & Yannakakis, On the desirability of acyclic database schemes, Journal of the Association for Computing Machinery, 30(3), 479-513 (1983).
  • [BB1972] Bertel\'e & Brioschi, Nonserial Dynamic Programming, Academic Press (1972).
  • [DP1960] Davis & Putnam, A Computing Procedure for Quantification Theory, Journal of the ACM, 7(3), 201-215 (1960).
  • [SDD1990] Shachter, D'Ambrosio & Del Favero, Symbolic probabilistic inference in belief networks, 126-131, in in: Proceedings of the 8th National Conference on Artificial Intelligence, edited by (1990)
  • [ZP1994] Zhang & Poole, A simple approach to Bayesian network computations, 171-178, in in: Proceedings of the 10th Biennial Canadian Artificial Intelligence Conference, edited by (1994)
  • [Dec1990] Rina Dechter, Bucket elimination: A unifying framework for reasoning, Artificial Intelligence, 113(1-2), 41-85 (1990).
  • [SS1988] Shenoy & Shafer, Axioms for probablity and belief-function propagation, 169-198, in in: Proceedings of the 4th Conference on Uncertainty in Artificial Intelligence, edited by (1988)
  • [Koh2003] Kohlas, Information Algebras: Generic Structures for Inference, Springer-Verlag (2003).
  • [Acp1987] Arnborg, Corneil & Proskurowski, Complexity of finding embeddings in a $k$-tree, , 8(2), 277-284 (1987).

Footnotes:

1
Chordalization is also called triangulation.

Author: Denis D. Mauá

Created: 2018-11-01 Thu 18:15

Validate