Workshop in Bioinformatics and Algorithms

WBA 2012 - Workshop in Bioinformatics and Algorithms - April 2 and 3, 2012

Multi-purpose room of NUMEC (Núcleo de Apoio à Pesquisa em Modelagem Estocástica e Complexidade) at the Institute of Mathematics and Statistics of the University of São Paulo - IME-USP

A meeting on the occasion of the visit to the University of São Paulo (USP) of the Bamboo Team under the leadership of Marie-France Sagot, INRIA Rhône-Alpes, France.

We are pleased to invite all the interested researchers to this informal meeting. The participants are welcome to attend the lectures as well as the Problem Session (with submissions on the spot). The meeting will focus on algorithmic aspects of problems arising in bioinformatics and combinatorial optimization.

Date Time Activity

April 2
(Monday) 10:00-10:40 Opening
(Talk 1) On BAMBOOs, LIRIOs, AMICIs, summer schools on algorithmics, and ciência sem fronteiras
Marie-France Sagot, INRIA and CNRS Univ. Lyon, France

10:40-11:10 (Talk 2) The maximum agreement subtree problem
Bhalchandra Thatte, IME-USP

11:10-11:30 = Coffee Break =

11:30-12:10 (Talk 3) An algorithm to generate draft genome sequence scaffolds
João Carlos Setubal, IQ-USP

12:15-14:00 == Lunch ==

14:00-14:30 (Talk 4) Algorithms and complexity of enumerating minimal precursor sets in genome-wide metabolic networks
Vicente Acuña, University of Chile

14:30-15:00 (Talk 5) Telling stories: enumerating compound relationships from a metabolomics experience
Paulo V. Milreu, Univ. C. Bernard, Lyon, France

15:00-16:00 Problem Session

16:00-16:20 = Coffee Break =

16:20-17:40 Group Work

20:00 * Dinner *

April 3
(Tuesday) 9:30-10:00 (Talk 6) Detection of local network motifs
Etienne Birmelé, Laboratoire Statistique et Génome, Évry, France

10:00-10:30 (Talk 7) Minimum ratio cover of matrix columns by extreme rays of its induced cone
Alexandre da Silva Freire, IME-USP

10:30-10:50 = Coffee Break =

10:50-11:20 (Talk 8) Sorting by almost-symmetrical reversals
Ulisses Dias, IC-UNICAMP

11:20-11:50 (Talk 9) On computing the diameter of real-world directed (weighted) graphs
Andrea Marino, Univ. of Florence, Italy

11:50-12:20 (Talk 10) Next generation sequencing (NGS) from the theoretical perspective
Gustavo Akio Sacomoto, Univ. C. Bernard, Lyon, France

12:20-14:00 == Lunch ==

14:00-16:00 Group work

16:00-16:20 = Coffee Break =

16:20-xxxx Group work

All activities will take place at the building of NUMEC, located next to the building (Bloco C) of Computer Science Department at IME-USP. The talks will be given in the multi-purpose room of NUMEC (ground floor), some group work will also take place there. The first floor may be used by the visitors, who have previously registered. Blackboards, tables, chairs and sofas are available in the first floor. The laboratories in the first and second floor may be used by the registered visitors to connect to the network of IME (with their own notebooks).

Talks

(Talk 1) On BAMBOOs, LIRIOs, AMICIs, summer schools on algorithmics, and ciência sem fronteiras
Marie-France Sagot, BAMBOO Team INRIA and CNRS UMR 5558 LBBE University Lyon 1, France
Abstract: In this talk, I will give a general presentation of the team for those who may not know it yet, of our more recent projects for those who know already the team, and of the things we would like to continue doing with you in the future.
(Talk 2) The maximum agreement subtree problem
Bhalchandra D. Thatte, postdoc at IME-USP
Abstract: Given two binary phylogenetic trees on $n$ leaves, we show that they have a common subtree on at least $O((\log{n})^{1/2 - \epsilon})$ leaves, thus improving on the previously known bound of $O(\log\log n)$. To achieve this bound, we combine different special cases: when one of the trees is balanced or when one of the trees is a caterpillar, we show a lower bound of $O(\log n)$. Another ingredient is the proof that every binary tree contains a large balanced subtree or a large caterpillar, a result that is interesting on its own. Finally, we also show that, there is an $\alpha > 0$ such that when both the trees are balanced, they have a common subtree on at least $O(n^\alpha)$ leaves.
[Joint work with Daniel Martin, Centro de Matemática, Computação e Cognição, Universidade Federal do ABC.]
(Talk 3) An algorithm to generate draft genome sequence scaffolds
João Carlos Setubal, Instituto de Química, USP
Abstract: We present a linear-time algorithm that can generate a set of contig scaffolds for a draft genome sequence represented in contigs given a reference genome. The algorithm is aimed at prokaryotic genomes and relies on the presence of matching sequence patterns between the query and reference genomes that can be interpreted as the result of large-scale inversions; we call these patterns inversion signatures. Our algorithm is capable of correctly generating a scaffold if at least one member of every inversion signature pair is present in contigs and no inversion signatures have been overwritten in evolution. The algorithm is also capable of generating scaffolds in the presence of any kind of inversion, even though in this general case there is no guarantee that all scaffolds in the scaffold set will be correct. We compare the performance of SIS, the program that implements the algorithm, to seven other scaffold-generating programs. The results of our tests show that SIS has overall better performance.
[Joint work with Zanoni Dias and Ulisses Dias, UNICAMP.]
(Talk 4) Algorithms and complexity of enumerating minimal precursor sets in genome-wide metabolic networks
Vicente Acuña, postdoc at University of Chile
Abstract: In the context of studying whole metabolic networks and their interaction with the environment, the following question arises: given a set of target metabolites T and a set of possible external source metabolites S, which are the minimal subsets of S that are able to produce all the metabolites in T ? Such subsets are called the minimal precursor sets of T. The problem is then whether we can enumerate all of them efficiently.
(Talk 5) Telling stories: enumerating compound relationships from a metabolomics experience
Paulo V. Milreu, PhD Student at Univ. Claude Bernard, Lyon
Abstract: We present a constrained version of the problem of enumerating all maximal directed acyclic subgraphs (DAG) of a graph G. In this version, we enumerate maximal DAGs whose sources and targets belong to a predefined subset of the nodes. We call such DAGs stories. We first show how to compute one story in polynomial-time, and then describe two diferent algorithms to tell all possible stories.
(Talk 6) Detection of local network motifs
Etienne Birmelé, Laboratoire Statistique et Génome, Évry, France
Abstract: Network motifs are small subgraphs of networks which are statistically over-represented, suggesting that they play a role in the organisation of the network. In biological applications, the motifs found by the existing methods appear to be highly concentrated in some areas of the networks. I'll give a definition of a local network motif based on this observation and discuss the statistical and algorithmic means to detect them.
(Talk 7) Minimum ratio cover of matrix columns by extreme rays of its induced cone
Alexandre da Silva Freire, PhD student at IME-USP
Abstract: Given a matrix "S" and a subset of columns "R", we study the problem of finding a cover of "R" with extreme rays of the cone "F={v in R^n | Sv=0, v >= 0}", where an extreme ray "v" covers a column "k" if "v_k > 0". In order to measure how proportional a cover is, we introduce two different minimization problems, namely the "minimum global ratio cover" (MGRC) and the "minimum local ratio cover" (MLRC) problems. In both cases, we apply the notion of the "ratio" of a vector "v", which is given by "max v_i / min v_j". We show that these two problems are NP-hard, even in the case in which "|R| = 1". We introduce a mixed integer programming formulation for the MGRC problem, which is solvable in polynomial time if all columns should be covered, and introduce a branch-and-cut algorithm for the MLRC problem. Finally, we present computational experiments on data obtained from real metabolic networks.
(Talk 8) Sorting by almost-symmetrical reversal
Ulisses Dias, PhD, IC-UNICAMP
Abstract: Inversions are one of the most frequent large-scale rearrangements observed in actual genomes. While a large body of literature exists on mathematical problems related to the computation of the inversion distance between abstract genomes, these works generally do not take into account that most inversions in bacterial chromosomes are symmetric or roughly symmetric with respect to the origin of replication. We define a new problem: how to sort genomes (or permutations) using almost-symmetric inversions. We show an algorithm that can sort any permutation using only almost-symmetric inversions. Two variants of this algorithm are presented that have better performance in practice. We explore the question of determining the minimum number of almostsymmetric inversions needed to sort a genome by presenting lower and upper bounds and results for special permutation families. The results obtained are the first steps in exploring this interesting new problem.
[Joint work with Zanoni Dias, João C. Setubal and Lenwood S. Heath.]
(Talk 9) On computing the diameter of real-world directed (weighted) graphs
Andrea Marino, PhD student at University of Florence, Italy; currently visiting the BAMBOO Team at Lyon
Abstract: We show a new algorithm for computing the diameter of directed unweighted graphs. Even though, in the worst case, this algorithm has complexity O(nm), where n is the number of nodes and m is the number of edges of the graph, we experimentally show that in practice our method works in linear time. Moreover, we show how to extend our algorithm to the case of directed weighted graphs and, even in this case, we present some preliminary very positive experimental results.
(Talk 10) Next generation sequencing (NGS) from the theoretical perspective
Gustavo Akio Sacomoto,PhD student at Univ. Claude Bernard, Lyon, France
Abstract: We present an overview of some computational methods used to treat NGS data, both genomic and transcriptomic. In the first part, we present two data structures currently used in the sequence assembly problem: overlap graph and de Bruijn graph. Together with some complexity results. In the second part, we present a method to extract meaningful information from RNA-seq data (transcriptomic) without assembly.

Organization

Chair: Yoshiko Wakabayashi, USP || Combinatorics and Combinatorial Optimization Research Group || NUMEC
Co-chair: Marie-France Sagot, Bamboo Team INRIA Rhône-Alpes and CNRS UMR 5558 LBBE University Lyon 1, France

Local committee: Alexandre da Silva Freire (USP), Cristina Gomes Fernandes (USP), Lourdes Vaz da Silva Netto (NUMEC-USP)

Support

IME-USP
and
Computer Science Graduate Program - IME-USP

NUMEC
Núcleo de Apoio à Pesquisa em Modelagem Estocástica e Complexidade

Project MaClinC
Mathematics, Computation, Language and the Brain

Bamboo Team
Research Group - Lyon, France

Last modified: Thu Mar 29 00:16:49 BRT 2012

Date	Time	Activity
April 2 (Monday)	10:00-10:40	Opening (Talk 1) On BAMBOOs, LIRIOs, AMICIs, summer schools on algorithmics, and ciência sem fronteiras Marie-France Sagot, INRIA and CNRS Univ. Lyon, France
	10:40-11:10	(Talk 2) The maximum agreement subtree problem Bhalchandra Thatte, IME-USP
	11:10-11:30	= Coffee Break =
	11:30-12:10	(Talk 3) An algorithm to generate draft genome sequence scaffolds João Carlos Setubal, IQ-USP
	12:15-14:00	== Lunch ==
	14:00-14:30	(Talk 4) Algorithms and complexity of enumerating minimal precursor sets in genome-wide metabolic networks Vicente Acuña, University of Chile
	14:30-15:00	(Talk 5) Telling stories: enumerating compound relationships from a metabolomics experience Paulo V. Milreu, Univ. C. Bernard, Lyon, France
	15:00-16:00	Problem Session
	16:00-16:20	= Coffee Break =
	16:20-17:40	Group Work
	20:00	* Dinner *
April 3 (Tuesday)	9:30-10:00	(Talk 6) Detection of local network motifs Etienne Birmelé, Laboratoire Statistique et Génome, Évry, France
	10:00-10:30	(Talk 7) Minimum ratio cover of matrix columns by extreme rays of its induced cone Alexandre da Silva Freire, IME-USP
	10:30-10:50	= Coffee Break =
	10:50-11:20	(Talk 8) Sorting by almost-symmetrical reversals Ulisses Dias, IC-UNICAMP
	11:20-11:50	(Talk 9) On computing the diameter of real-world directed (weighted) graphs Andrea Marino, Univ. of Florence, Italy
	11:50-12:20	(Talk 10) Next generation sequencing (NGS) from the theoretical perspective Gustavo Akio Sacomoto, Univ. C. Bernard, Lyon, France
	12:20-14:00	== Lunch ==
	14:00-16:00	Group work
	16:00-16:20	= Coffee Break =
	16:20-xxxx	Group work