Workshop in Bioinformatics and Algorithms

WBA 2010 - Workshop in Bioinformatics and Algorithms

July 27 and 28, 2010

Auditorium of NUMEC (Núcleo de Apoio à Pesquisa em Modelagem Estocástica e Complexidade) at the Institute of Mathematics and Statistics of the University of São Paulo - IME-USP

A meeting on the occasion of the visit to the University of São Paulo (USP) of the Bamboo Team under the leadership of Marie-France Sagot, INRIA Rhône-Alpes, France. On the occasion, other researchers from Europe, Chile and Brazil, who collaborate with this team will also be visiting USP.

We are pleased to invite all the interested researchers to this informal meeting. The participants are welcome to attend the lectures as well as the Problem Session (with submissions on the spot). The meeting will focus on algorithmic aspects of problems arising in bioinformatics and combinatorial optimization.

Program

Date	Time	Activity
July 26 (Monday)	-	Visit to IME-USP of the participants from abroad and from UFMS
July 27 (Tuesday)	9:30-10:00	Opening -- Marie-France Sagot
	10:00-10:35	Talk 1 - Pierluigi Crescenzi
	10:35-11:00	Talk 2 - Pierre Peterlongo
	11:00-11:30	= Coffee Break =
	11:30-11:55	Talk 3 - Maria Emília Walter
	12:00-14:00	== Lunch ==
	14:00-14:35	Talk 4 - Alberto Marchetti-Spaccamela
	14:35-15:00	Talk 5 - Flávio Keidi Miyazawa
	15:00-16:00	Problem Session
	16:00-16:20	= Coffee Break =
	16:20-17:40	Group Work
	20:00	* Dinner *
July 28 (Wednesday)	9:30-10:05	Talk 6 - Leen Stougie
	10:05-10:30	Talk 7 - Eduardo Jordão Neves
	10:30-10:50	= Coffee Break =
	10:50-11:15	Talk 8 - Christian Baudet
	11:15-11:40	Talk 9 - Vicente Acuña
	11:40-12:05	Talk 10 - Paulo Vieira Milreu
	12:05-14:00	== Lunch ==
	14:00-16:00	Group work
	16:00-16:20	= Coffee Break =
	16:20-xxxx	Group work

All activities will take place at the building of NUMEC, located next to the building (Bloco C) of Computer Science Department at IME-USP.

The talks will be given in the Auditorium of NUMEC (ground floor). For the group work, the participants may use the Auditorium as well as the first floor of NUMEC. Blackboards, tables, chairs and sofas are available in the first floor.

The Laboratory at the first floor may be used by the participants to connect (by cable) to the network of IME (with their own notebooks).

Talks

Marie-France Sagot
Inria Rhône-Alpes, Université Claude Bernard, Lyon I
Title: Introducing BAMBOO-BAOBAB, SIMBIOSI, and SISYPHE
Pierluigi Crescenzi
University of Florence, Florence, Italy
Title: Finding the Diameter in Real-World Graphs
Pierre Peterlongo
INRIA Rennes, France
Title: Identifying SNPs without a reference genome by comparing raw reads
Maria Emilia Walter
University of Brasília, Brasília, Brazil
Title: The problem of sorting by reversals using group theory
Alberto Marchetti-Spaccamela
University "La Sapienza", Rome, Italy
Title: Real time scheduling: a gentle introduction for theoreticians
Flávio Keidi Miyazawa
Institute of Computing, State University of Campinas - UNICAMP, Brazil
Title: Multidimensional Selfish Bin Packings
Leen Stougie
Free University of Amsterdam and CWI, Amsterdam, The Netherlands
Title: Universal sequencing on a single machine
Eduardo Jordão Neves
Institute of Mathematics and Statistics, University of São Paulo, member of NUMEC, Brazil
Title: Signaling networks and type-dependent interaction models
Christian Baudet
(PhD student) UNICAMP, INRIA & LBBE University Claude Bernard, Lyon, France
Title: Enumerating traces of solutions of the problem of sorting by signed reversals
Vicente Acuña
INRIA & LBBE University Claude Bernard, Lyon, France
Title: On the computational complexity of enumerating all the solutions of some biological problems in metabolic networks
Paulo Vieira Milreu
(PhD student) INRIA & LBBE University Claude Bernard, Lyon, France
Title: Enumerating Chemical Organisations in Consistent Networks

Abstracts of the Talks

Finding the diameter in real-world graphs
Pierluigi Crescenzi, Dipartimento di Sistemi e Informatica, University of Florence, Italy
The diameter of an unweighted graph is the maximum pairwise distance among its connected vertices. It is one of the main measures in real-world graphs and complex networks. The "double sweep" is a simple method to find a lower bound for the diameter. It chooses a random vertex and performs two breadth-first searches (BFSes), returning the maximum length among the shortest paths thus found. We propose an algorithm called "fringe", which uses few BFSes to find a matching upper bound for almost all the graphs in our dataset of 44 real-world graphs. In the few graphs it cannot, we perform an exhaustive search of the diameter using a cluster of machines for a total of 40 cores. In all cases, the diameter is surprisingly equal to the lower bound found after very few executions of the double sweep method. The lesson learned is that the latter can be used to find the diameter of real-world graphs in many more cases than expected, and our fringe algorithm can quickly validate this finding for most of them. Pierre Peterlongo INRIA
Identifying SNPs without a reference genome by comparing raw reads
Pierre Peterlongo, INRIA Rennes, France
Next generation sequencing (NGS) technologies are being applied to many fields of biology, notably to survey the polymorphism across individuals of a species. However, while single nucleotide polymorphisms (SNPs) are almost routinely identified in model organisms, the detection of SNPs in non model species remains very challenging due to the fact that almost all methods rely on the use of a reference genome. We address here the problem of identifying SNPs without a reference genome. For this, we propose an approach which compares two sets of raw reads. We show that a SNP corresponds to a recognisable pattern in the de Bruijn graph built from the reads, and we propose algorithms to identify these patterns, that we call mouths. We outline the potential of our method on real data. The method is tailored to short reads (typically Illumina), and works well even when the coverage is low where it reports few but highly condent SNPs. Our program, called kisSnp, can be downloaded here: http://alcovna.genouest.org/kissnp/.
The problem of sorting by reversals using group theory
Maria Emília Walter, UnB, Brasília, Brazil
Genome rearrangements is a field in which we study mutational events affecting large portions of a genome. One such event is the reversal, which inverts the position of contiguous blocks of genes inside a chromosome. This event generates the problem of reversal distance, which is to find the minimal number of reversals transforming one chromosome into another. When the location of the genes are known, this problem is polynomial, but when they are not known the problem is NP-hard. For this last case, many algorithms have been proposed in the literature, following different techniques. In general, the methods are based on exhaustive analysis of graphical properties of suitable cycle graphs. We are beginning to formalize the problem, using theory of groups in Algebra. We will discuss a very simple branch-and-bound algorithm, which was obtained with algebraic operations in permutations simulating the possible reversals that could be applied to a chromosome. The objective of these preliminary studies is to relate the reversal and the problem of reversal distance to algebraic operations so that we could use known results of group theory to propose algorithms with a more formal approach. We think that solutions to problems generated from other mutational events could be obtained by common algebraic definitions.
Real time scheduling: a gentle introduction for theoreticians
Alberto Marchetti-Spaccamela, Dipartimento di Informatica e Sistemistica, University of Rome "La Sapienza", Italy
A real-time task system consists of a finite number of tasks, each of which generates an infinite sequence of recurring jobs with the same processing requirement characterized by a deadline. We distinguish two main classes of task systems, periodic and sporadic task systems, which differ in the generation of job sequences recurrence scheme. There is given one or multiple processors, each of which can process only one job at the time. Now, each job must be executed by the system, possibly with preemptions and migration, before its deadline. A task system is said to be feasible if there exists a schedule such that each job completes its execution requirement before its deadline. We review complexity results for different variation of the problem. In the hope of overcoming hardness results, we also discuss different relaxations of the problem.
Multidimensional selfish bin packings
Flávio Keidi Miyazawa, UNICAMP,Campinas, Brazil
We consider a two- (multi-)dimensional bin packing problems, where each user selfishly controls a rectangular item that must be packed into a rectangular bin and is charged with a cost according to the fraction of the used bin space required by its item. When items are oriented, we show that it has unbounded price of anarchy (PoA). For squares, we show that PoA is bounded by a value close to 2.6, and is at least 2.2. We also consider versions with rotations, hypercubes, parametric.
Universal scheduling on a single machine
Leen Stougie, Dept. of Operations Research, Vrije Universiteit & CWI, Amsterdam, The Netherlands
In general scheduling problems we assume that machines are always available until the set of assigned jobs is completed. However, in reality machines might not be available for preventive maintenance, they may slow down due to simultaneous utilization by other users, they may break down completely for some time, etc. In the latter cases unavailability is unpredictable. Thus the quest for a schedule that is robust against machine calamities emerges: the only influence the scheduler has is on the order in which he offers his jobs to be processed. If the sequence cannot be adapted during execution, we speak of a universal scheduling problem. We study this scheduling problem on a single machine when the objective is to minimize the sum of weighted completion times of the jobs. We compute a universal scheduling sequence in which the jobs are processed no matter how, unexpectedly, the machine may become unavailable and changes its processing speed. In case of a machine breakdown the job is preempted and resumed after the breakdown period. We analyze the worst case performance by comparing the solution value provided by an algorithm with that of an optimal clairvoyant algorithm that knows the machine behavior in advance, and that is even allowed to preempt jobs at any time. Our main results are a deterministic and a randomized universal order of the jobs, computable in polynomial time, such that scheduling the jobs in this order will always yield a solution that remains within multiplicative factor 4 from clairvoyant optimal for the deterministic order and within multiplicative factor $e$ in expectation from optimal for the randomized order. We can adapt our algorithm to solve more general problem instances with certain types of precedence constraints without losing in the performance. We complement these results with matching lower bounds that show that these ratios are best possible.
Signaling networks and type-dependent interaction models
Eduardo Jordão Neves, IME-USP, NUMEC, Brazil
We will briefly describe our modeling approach for biological signaling networks in cancer research which is based on interaction particle systems with non reversible dynamics. In their simplest setting, these models generalise the usual Stochastic Ising Model with Glauber dynamics to allow type-dependent spin-flip rates that mimic biochemical interactions and which are no longer reversible with respect to the Gibbs measure. In the thermodynamic limit the corresponding macroscopic densities evolve according to deterministic odes that may exhibit highly non-trivial bifurcation diagrams. These "toy models" appear to be useful in understanding cellular control mechanisms.
Enumerating traces of solutions of the problem of sorting by signed reversals
Christian Baudet (PhD student), UNICAMP, Campinas, and INRIA & University Lyon 1, France
Reversals are rearragement events which invert the position of contiguous blocks of genomic markers inside a chromosome. A permutation of signed integers can be used to represent the order of these markers in the chromosome of a given species. The problem of sorting by signed reversals consists on finding an optimal sequence of reversals that can be applied in the origin permutation to obtain the target permutation. Traditional algorithms to solve this problem can execute in polynomial time. However, these algorithms output just one optimal solution for the problem, while the space of optimal solutions can be huge. The concept of traces can be used to group equivalent solutions in order to obtain a more compact representation of the set of all optimal solutions. An exponential algorithm for enumerating all traces of solutions of the problem of sorting by signed reversals is available.
Enumerating chemical organisations in consistent networks
Paulo Vieira Milreu (PhD student), INRIA & University Lyon 1, France
The structural analysis of metabolic networks aims both at understanding the function and the evolution of metabolism. While it is commonly admitted that metabolism is modular, the identification of metabolic modules remains an open topic. Several definitions of what is a module have been proposed. We focus here on the notion of chemical organisations, i.e. sets of molecules which are closed and self-maintaining. We show that finding a reactive organisation is NP-hard even if the network is flux-consistent and that the hardness comes from blocking cycles. We then propose new algorithms for enumerating chemical organisations that are theoretically more efficient than existing approaches.

Participants

Adroaldo Moreira Borges. MSc student, IME-USP
Alexandre Albano, MSc student, IME-USP
Alexandre Freire, PhD student, IME-USP
Alvaro Junio Pereira Franco, PhD student, IME-USP
Carlos H.A. Higa, PhD student, IME-USP
Cecília Klein, MSc student, Univ. Claude Bernard, Lyon, and LNCC, Brazil
Christian Tjandraatmadja, MSc student, IME-USP
Diego Rubert, MSc student, UFMS
Fabrício Martins Lopes, UTFPR - Paraná, PhD student, USP
Guilherme Puglia Assunção, MSc student, IME-USP
Gustavo Akio T. Sacomoto, MSc student, IME-USP
Gustavo Rodrigues Galvão, MSc student, IC-UNICAMP
Jorge Luis Guevara Díaz, PhD student, IME-USP, Brazil
Karla Lima, PhD student, IME-USP
Leissi Castañeda León, MSc student, IME-USP, Brazil
Leonardo Marchetti, MSc student, IME-USP
Marli A. Guarino, MSc student, IME-USP
Michel Mozinho, PhD student, USP
Milton Yutaka Nishiyama Jr, PhD student, USP
Murilo de Lima, MSc student, IME-USP
Phelipe Fabres, MSc student, UFMS
Rafael Barbosa, MSc student, IME-USP
Rodrigo Mitsuo Kishi, MSc student, UFMS
Ronaldo Florilo dos Santos, MSc student, UFMS
Thiago Serra, MSc student, IME-USP

Organization

Chair: Yoshiko Wakabayashi, USP || Combinatorics and Combinatorial Optimization Research Group || NUMEC
Co-chair: Marie-France Sagot, INRIA Rhône-Alpes, Université Claude Bernard, Lyon I

Support

IME-USP
and
Computer Science Graduate Program - IME-USP

Combinatorics and Combinatorial Optimization
Research Group - IME-USP

Last modified: Tue Jul 27 01:42:04 BRT 2010