Workshop in Bioinformatics and Algorithms

WBA 2014 - Workshop in Bioinformatics and Algorithms

March 25 and 26, 2014

Auditorium of NUMEC (Núcleo de Apoio à Pesquisa em Modelagem Estocástica e Complexidade)
at the Institute of Mathematics and Statistics of the University of São Paulo - IME-USP

A meeting on the occasion of the visit to the University of São Paulo (USP) of the Bamboo Team under the leadership of Marie-France Sagot, INRIA Rhône-Alpes, France.

We are pleased to invite all the interested researchers to this informal meeting. The participants are welcome to attend the lectures as well as the Problem Session (with submissions on the spot). The meeting will focus on algorithmic aspects of problems arising in bioinformatics and combinatorial optimization.

Program

Date	Time	Activity
March 25 (Tuesday)	9:30-10:00	From Bamboo to Erable (Marie-France Sagot, INRIA & LBBE)
	10:00-10:30	Precursor sets in metabolic networks (Martin Wannagat, INRIA & LBBE)
	10:30-11:00	Mirinho: efficient precursor miRNA predictor (Susan Higashi, INRIA & LBBE)
	11:00-11:30	== Coffee Break ==
	11:30-12:00	Algorithms to tell stories (Paulo Vieira Milreu, Researcher at TecSinapse)
	12:00-14:00	== Lunch ==
	14:00-14:30	Sorting permutations by prefix and suffix versions of reversals and transpositions (Carla Negri Lintzmayer, IC-UNICAMP)
	14:30-15:00	Convex recoloring of phylogenetic trees (Phablo F.S. Moura, IME-USP)
	15:00-16:00	= Open Problem Session == (open to all participants who wish to present problems, to discuss ideas, and possibly start new collaborations)
	16:00-16:20	== Coffee Break ==
	16:20-17:40	== Group Work ==
	20:00	== Dinner ==
March 26 (Wednesday)	9:30-10:00	Statistical methods in graphs with applications in biological data (André Fujita, IME-USP)
	10:00-10:30	Algorithmic issues in cophylogenetic analysis (Blerina Sinaimeri, INRIA & LBBE)
	10:30-11:00	== Coffee Break ==
	11:00-11:30	Topological colored motifs in reaction graphs (Marco Aurélio Stefanes, UFMS)
	11:30-12:00	Efficient algorithms to list st-paths (Gustavo Sacomoto, INRIA & LBBE)
	12:00-14:00	== Lunch ==
	14:00-16:00	== Group Work ==
	16:00-16:20	== Coffee Break ==
	16:20-xxxx	== Group Work ==

All activities will take place at the building of NUMEC, located next to the building (Bloco C) of Computer Science Department at IME-USP.

The talks will be given in the Auditorium of NUMEC (ground floor). For the group work, the participants may use the Auditorium as well as the ground floor of NUMEC. Another blackboard is available next to the Auditorium.

The Laboratory at the first floor may be used by the authorized participants (who have sent their photos to the organization) to connect to the network of IME (with their own notebooks). Eduroam is also available.

Abstracts of the Talks

From Bamboo to Erable
Marie-France Sagot, INRIA & LBBE University Claude Bernard, Lyon, France
Abstract: A short opening section.
Precursor sets in metabolic networks
Martin Wannagat, PhD student at INRIA & LBBE University Claude Bernard, Lyon, France
Abstract: Metabolic networks give rise to several questions, e.g.: What are the possible pathways explaining differences of concentration of metabolites under two conditions? What are the possible fluxes that maintains the network in steady state? How can one identify sets of metabolites that are stable in time? What are the possible inputs a metabolic network needs to produce a given target metabolite. In this talk I present some algorithms that treat the last question.
Mirinho: efficient precursor miRNA predictor
Susan Higashi, PhD Student at INRIA & LBBE University Claude Bernard, Lyon, France
Abstract: The vast majority of the methods for pre-miRNA prediction rely on cubic algorithms. They perform well for a small number of sequences; however, at a genome-wide scale they may become impracticable. It is then crucial to develop faster and more reliable methods to tackle such large scale data. To solve this problem, we developed a new method, called Mirinho. The novelty of the method is its efficiency, made possible by the use of an algorithm of quadratic instead of cubic complexity, and the fact that it is a single step approach which does require the use of post-processing filters. Mirinho is thus much faster than the available methods, 70 times more than the quickest one tested. In the majority of the cases, Mirinho also showed better sensitivity while keeping a precision that has a similar order of magnitude as compared to that of other approaches. Mirinho is also flexible in the sense that it was designed to work for the prediction of both animal and plant pre-miRNAs, while other methods are often tuned to a specific clade. Finally, we also provide to the user the option to enrich the prediction by using small RNA sequencing and conservation data.
Algorithms to tell stories
Paulo Vieira Milreu, Researcher at TecSinapse
Abstract: We present a constrained version of the problem of enumerating all maximal directed acyclic subgraphs (DAG) of a graph G. In this version, we enumerate maximal DAGs whose sources and targets belong to a predefined subset of the nodes. We call such DAGs stories. We first show how to compute one story in polynomial-time, and then describe two di erent algorithms to tell all possible stories.
Sorting permutations by prefix and suffix versions of reversals and transpositions
Carla Negri Lintzmayer, PhD student IC-UNICAMP, Campinas, Brazil
Abstract: Reversals and transpositions are the most common kinds of genome rearrangements, which allow us to establish the divergence between individuals along evolution. When the rearrangements affect segments from the beginning or from the end of the genome, we say they are prefix or suffix rearrangements, respectively. This paper presents the first approximation algorithms for the problems of Sorting by Prefix Reversals and Suffix Reversals, Sorting by Prefix Transpositions and Suffix Transpositions and Sorting by Prefix Reversals, Prefix Transpositions, Suffix Reversals and Suffix Transpositions, all of them with factor 2.
Convex recoloring of phylogenetic trees
Phablo F.S. Moura, PhD student at IME-USP, São Paulo, Brazil
Abstract: Given a graph G with an arbitrary vertex coloring, the Convex Recoloring Problem (CR) consists in recoloring the minimum number of vertices so that each color induces a connected subgraph. This problem is known to be NP-hard even when G is a path. It was introduced by Moran and Snir in 2005, motivated by studies on phylogenetic trees. We show an integer programming formulation for arbitrary graphs, and then specialize it for trees. We study the facial structure of the associated polytope, present several classes of facet-defining inequalities and the corresponding separation algorithms. We also present a branch-and-cut algorithm that we have implemented for the special case of trees, and show the computational results obtained with a large number of instances. We consider instances which are real phylogenetic trees. The experiments show that this approach can be used to solve instances up to 1500 vertices in 2 hours, comparing favorably to other approaches that have been proposed in the literature. This is joint work with Manoel Campêlo (UFC), Alexandre S. Freire (IC-Unicamp), Karla R. Lima (EACH-USP) and Y. Wakabayashi (USP).
Statistical methods in graphs with applications in biological data
André Fujita, IME-USP, São Paulo, Brazil
Abstract: The understanding of the biological mechanisms underlying human diseases is one of the main topics of study in biological sciences. One of the challenges consists in understanding diseases by developing methods to statistically analyze and computationally manipulate biological data. This difficulty is generated by large data size, heterogeneity, multidimensionality, and presence of intrinsic noise. In this context, I will present formal statistical methods in graphs (hypothesis test, model selection, parameter estimation, statistical test for clusters, etc) with illustrative applications in functional magnetic resonance imaging and gene expression data.
Algorithmic issues in cophylogenetic analysis
Blerina Sinaimeri, Postdoc at INRIA & LBBE University Claude Bernard, Lyon, France
Abstract: Coevolution happens when two (or more) species reciprocally affect each other’s evolution and it is likely to happen when the species have close ecological interactions with one another. These interactions can be of different kind but in this talk we will focus in parasite/host relationships. In this context we will exploit problems in co-phylogeny, that aims to understand how the evolution of the host has driven the evolution of the parasite by reconstructing their ancient relationships using phylogenetic information. We will formulate the co-phyologeny problems as combinatorial problems and show that despite the increasingly research in this area there are still many hurdles remaining. In this direction we propose some challenging open problems.
Topological colored motifs in reaction graphs
Marco Aurélio Stefanes, UFMS, Campo Grande, Mato Grosso do Sul, Brazil
Abstract: In the context of structural analysis of metabolic networks, the problem of searching occurrences of motifs plays a central role. In metabolic networks, motifs are associated with both the basic modules of molecular information and the functional and structural features. In this talk we address the topological colored motif search problem in metabolic networks, here represented by a reaction graph. Recently, several variations of this problem have been studied. We present some hardness results for finding motifs and we describe a polynomial algorithm for the case in which the motif is a colorful tree.
Efficient algorithms to list st-paths
Gustavo Sacomoto, Postdoc at INRIA & LBBE University Claude Bernard, Lyon, France
Abstract: Listing all the simple st-paths (and cycles) in undirected graphs is a classical problem whose efficient solutions date back to the early 70s. For a graph with n vertices and m edges, containing K st-paths, the best known solution in the literature is given by Johnson's algorithm [SIAM J. Computing, 1975] and takes O((K + 1)(m + n)) time. When the graph is weighted, a natural variant is to list only the st-paths with length bounded by some constant. In this talk, I present some techniques that allows us to design an efficient algorithm to list bounded length st-paths [submitted, 2014], and the first optimal algorithm to list st-paths [SODA, 2013].

Confirmed Participants

Organization

Chair: Yoshiko Wakabayashi, USP || Combinatorics and Combinatorial Optimization Research Group || NUMEC
Co-chair: Marie-France Sagot, INRIA Rhône-Alpes, Université Claude Bernard, Lyon I
Alexandre da Silva Freire, IC-UNICAMP
Lourdes Vaz da Silva Netto, NUMEC-USP (lourdesv@ime.usp.br)

Support

IME-USP
and
Computer Science Graduate Program - IME-USP

Núcleo de Apoio à Pesquisa em Modelagem Estocástica e Complexidade (NUMEC)

Combinatorics and Combinatorial Optimization
Research Group - IME-USP

Last modified: Thu Mar 20 20:44:20 BRT 2014