MAC6916 Probabilistic Graphical Models
These are the lecture notes of the Graduate Course MAC6916 Probabilistic Graphical Models, taught by Denis D. Mauá within the Computer Science Program of the Institute of Mathematics & Statistics since 2016. These notes are preliminary and in constant change (use at your own risk 😃). Comments and error reports are very welcome.

Introduction
In this lecture, we meet an informal introduction to probabilistic reasoning using graphical models. We discuss compact representation of joint probability distributions, graphical representation of independences, and the advantages of a modelbased approach.

Probability Theory
In this lecture, we discuss the different meanings attributed to probability. We show how the Kolmogorov’s axioms of probability can be derived as a disposition to avoid sure loss. We then review the basic concepts of probability theory such as probability measures, random variables, conditional probability, joint probability distributions, independence, the total probability rule, the chain rule, and Bayes’ rule.

Bayesian Networks
$ \newcommand{\nb}[1]{\text{ne}(#1)} \newcommand{\pa}[1]{\text{pa}(#1)} \newcommand{\ch}[1]{\text{ch}(#1)} \newcommand{\de}[1]{\text{de}(#1)} \newcommand{\an}[1]{\text{an}(#1)} \newcommand{\nd}[1]{\text{nd}(#1)} \newcommand{\ind}{⫫} $ Any probabilistic phenomenon can be described by a joint probability distribution over a set of random variables. However, directly specifying and manipulating multivariate probability distributions is difficult and inefficient.

Representing Independences
$ \newcommand{\nb}[1]{\text{ne}(#1)} \newcommand{\pa}[1]{\text{pa}(#1)} \newcommand{\ch}[1]{\text{ch}(#1)} \newcommand{\de}[1]{\text{de}(#1)} \newcommand{\an}[1]{\text{an}(#1)} \newcommand{\nd}[1]{\text{nd}(#1)} \newcommand{\can}[1]{\overline{\text{an}}(#1)} \newcommand{\ind}{⫫} \newcommand{\dsep}{\perp_d} \newcommand{\msep}{\perp_m} $ Bayesian networks can represent a probability distribution compactly by exploiting a set of Markov conditions in a digraph. This set of Markov conditions imply (and are implied) by other independences.

Missing Data
$ \newcommand{\nb}[1]{\text{ne}(#1)} \newcommand{\pa}[1]{\text{pa}(#1)} \newcommand{\ch}[1]{\text{ch}(#1)} \newcommand{\de}[1]{\text{de}(#1)} \newcommand{\an}[1]{\text{an}(#1)} \newcommand{\nd}[1]{\text{nd}(#1)} \newcommand{\can}[1]{\overline{\text{an}}(#1)} \newcommand{\ind}{⫫} \newcommand{\dsep}{\perp_d} \newcommand{\msep}{\perp_m} $ When using or estimating models we are often faced with partial information such as when a sensor fails to record its measurement, when survey respondents are unable or unwilling to answer questions, or when certain quantities are simply not directly observable.

Markov Equivalences
$ \newcommand{\nb}[1]{\text{ne}(#1)} \newcommand{\pa}[1]{\text{pa}(#1)} \newcommand{\ch}[1]{\text{ch}(#1)} \newcommand{\de}[1]{\text{de}(#1)} \newcommand{\an}[1]{\text{an}(#1)} \newcommand{\nd}[1]{\text{nd}(#1)} \newcommand{\can}[1]{\overline{\text{an}}(#1)} \newcommand{\ind}{⫫} \newcommand{\dsep}{\perp_d} \newcommand{\msep}{\perp_m} $ In this lecture, we discuss the classes of graphs that are equivalent under dseparation. We discuss how can two graphs be efficiently tested for equivalence, and how can we represent the class of all equivalent graphs.

Markov Networks
$ \newcommand{\nb}[1]{\text{ne}(#1)} \newcommand{\pa}[1]{\text{pa}(#1)} \newcommand{\ch}[1]{\text{ch}(#1)} \newcommand{\de}[1]{\text{de}(#1)} \newcommand{\an}[1]{\text{an}(#1)} \newcommand{\nd}[1]{\text{nd}(#1)} \newcommand{\can}[1]{\overline{\text{an}}(#1)} \newcommand{\ind}{⫫} \newcommand{\dsep}{\perp_d} \newcommand{\sep}{\perp_u} \newcommand{\msep}{\perp_m} $ We examine the representation of independence by undirected graphs, resulting in the concept of Markov networks. We also compare Bayesian networks and Markov networks as representational devices for independences.

ConstraintBased Structure Learning
$ \newcommand{\nb}[1]{\text{ne}(#1)} \newcommand{\pa}[1]{\text{pa}(#1)} \newcommand{\ch}[1]{\text{ch}(#1)} \newcommand{\de}[1]{\text{de}(#1)} \newcommand{\an}[1]{\text{an}(#1)} \newcommand{\nd}[1]{\text{nd}(#1)} \newcommand{\can}[1]{\overline{\text{an}}(#1)} \newcommand{\ind}{⫫} \newcommand{\dsep}{\perp_d} \newcommand{\sep}{\perp_u} \newcommand{\msep}{\perp_m} $ We discuss the problem of inferring the structure of a Bayesian network from data. There are mainly two approaches to this problem.

Parameter Learning
$ \newcommand{\nb}[1]{\text{ne}(#1)} \newcommand{\pa}[1]{\text{pa}(#1)} \newcommand{\ch}[1]{\text{ch}(#1)} \newcommand{\de}[1]{\text{de}(#1)} \newcommand{\an}[1]{\text{an}(#1)} \newcommand{\nd}[1]{\text{nd}(#1)} \newcommand{\can}[1]{\overline{\text{an}}(#1)} \newcommand{\ind}{⫫} \newcommand{\dsep}{\perp_d} \newcommand{\sep}{\perp_u} \newcommand{\msep}{\perp_m} $ Probabilistic graphical models consist of a qualitative part, that encodes variable interactions and is represented by a graph, and a quantitative part, that encodes the strength of interactions and is represented by realvalued functions.

ScoreBased Structure Learning
$ \newcommand{\nb}[1]{\text{ne}(#1)} \newcommand{\pa}[1]{\text{pa}(#1)} \newcommand{\ch}[1]{\text{ch}(#1)} \newcommand{\de}[1]{\text{de}(#1)} \newcommand{\an}[1]{\text{an}(#1)} \newcommand{\nd}[1]{\text{nd}(#1)} \newcommand{\can}[1]{\overline{\text{an}}(#1)} \newcommand{\ind}{⫫} \newcommand{\dsep}{\perp_d} \newcommand{\sep}{\perp_u} \newcommand{\msep}{\perp_m} $ Recall that a Bayesian network can be seen from two perspectives: as an efficient encoding of an independence relation, and as an efficient parametrization of a highdimensional probability distribution by a product of conditional probabilities.

Inference By Sampling
$ \newcommand{\nb}[1]{\text{ne}(#1)} \newcommand{\pa}[1]{\text{pa}(#1)} $ In this and in the next few lectures we study practical algorithms for the inference problem, which comes in many forms. To make this discussion concise we will focus on the computation of the probability of evidence $P(E_1=e_1,\dotsc,E_m=e_m)$ where $E_1,\dotsc,E_m$ is a subset of the varaibles called the evidence variables and the $e_1,\dotsc,e_m$ is the evidence.

Variable Elimination
$ \newcommand{\nb}[1]{\text{ne}(#1)} \newcommand{\pa}[1]{\text{pa}(#1)} \newcommand{\sco}[1]{\text{sc}(#1)} \newcommand{\del}[2]{{#2}^{#1}} $ We resume our study on algorithms for computing the probability of evidence (or the partition function) of a Bayesian network or Markov network. We discuss Variable Elimination, a generic framework for producing exact inference in probabilistic graphical models.

SumProduct Algorithms
$ \newcommand{\nb}[1]{\text{ne}(#1)} \newcommand{\pa}[1]{\text{pa}(#1)} \newcommand{\ch}[1]{\text{ch}(#1)} \newcommand{\de}[1]{\text{de}(#1)} \newcommand{\sco}[1]{\text{sc}(#1)} \newcommand{\del}[2]{{#2}^{#1}} $ As we have seen, variable elimination is sound but inefficient in networks with high treewidth. Kwisthout et al.1 showed that if the sumproduct problem can be solved in subexponential time in the treewidth, then the propositional logic satisfiability problem (SAT) can also solved in subexponential time in the number of propositions.

Bibliography
Sewall Wright. Systems of mating. I. The biometric relations between parent and offspring. Genetics 6: 111, 1921. Nir Friedman. Inferring Cellular Networks Using Probabilistic Graphical Models, Science 303: 5659, pp.

Figures Place figures here and refer to them with relative path mac6916, for example: Partially digraph representation of the equivalence classes of 3node structures to insert figure pdag.svg. To create an SVG file from a pdf, use either pdf2svg or convert.