A meeting on the occasion of the visit to the University of São Paulo (USP) of the Bamboo Team under the leadership of Marie-France Sagot, INRIA Rhône-Alpes, France.
We are pleased to invite all the interested researchers to this informal meeting. The participants are welcome to attend the lectures as well as the Problem Session (with submissions on the spot). The meeting will focus on algorithmic aspects of problems arising in bioinformatics and combinatorial optimization.
Date | Time | Activity |
April 2 (Monday) |
10:00-10:40 | Opening (Talk 1) On BAMBOOs, LIRIOs, AMICIs, summer schools on algorithmics, and ciência sem fronteiras Marie-France Sagot, INRIA and CNRS Univ. Lyon, France |
10:40-11:10 |
(Talk 2) The maximum
agreement subtree problem Bhalchandra Thatte, IME-USP | |
11:10-11:30 | = Coffee Break = | |
11:30-12:10 |
(Talk 3) An algorithm to
generate draft genome sequence
scaffolds João Carlos Setubal, IQ-USP |
|
12:15-14:00 | == Lunch == | |
14:00-14:30 |
(Talk 4) Algorithms
and complexity of enumerating minimal
precursor sets in genome-wide metabolic networks Vicente Acuña, University of Chile |
|
14:30-15:00 |
(Talk 5) Telling stories: enumerating compound relationships from a metabolomics experience Paulo V. Milreu, Univ. C. Bernard, Lyon, France |
|
15:00-16:00 | Problem Session | |
16:00-16:20 | = Coffee Break = | |
16:20-17:40 | Group Work | |
20:00 | * Dinner * | |
April 3 (Tuesday) |
9:30-10:00 |
(Talk 6) Detection of local network motifs Etienne Birmelé, Laboratoire Statistique et Génome, Évry, France |
10:00-10:30 |
(Talk 7) Minimum ratio cover of matrix columns by extreme rays of its induced cone Alexandre da Silva Freire, IME-USP |
|
10:30-10:50 | = Coffee Break = | |
10:50-11:20 | (Talk 8) Sorting by almost-symmetrical reversals
Ulisses Dias, IC-UNICAMP |
|
11:20-11:50 |
(Talk 9) On computing
the diameter of real-world directed
(weighted) graphs Andrea Marino, Univ. of Florence, Italy |
|
11:50-12:20 |
(Talk 10) Next
generation sequencing (NGS) from the
theoretical perspective Gustavo Akio Sacomoto, Univ. C. Bernard, Lyon, France |
|
12:20-14:00 | == Lunch == | |
14:00-16:00 | Group work | |
16:00-16:20 | = Coffee Break = | |
16:20-xxxx | Group work |
Abstract: In this talk, I will give a general presentation of the team for those who may not know it yet, of our more recent projects for those who know already the team, and of the things we would like to continue doing with you in the future.
Abstract: Given two binary phylogenetic trees on $n$ leaves, we show
that they have a common subtree on at least $O((\log{n})^{1/2 -
\epsilon})$ leaves, thus improving on the previously known bound of
$O(\log\log n)$. To achieve this bound, we combine different special
cases: when one of the trees is balanced or when one of the trees is a
caterpillar, we show a lower bound of $O(\log n)$. Another ingredient
is the proof that every binary tree contains a large balanced subtree
or a large caterpillar, a result that is interesting on its own.
Finally, we also show that, there is an $\alpha > 0$ such that when
both the trees are balanced, they have a common subtree on at least
$O(n^\alpha)$ leaves.
[Joint work with Daniel Martin, Centro de Matemática, Computação e Cognição, Universidade Federal do ABC.]
Abstract:
We present a linear-time algorithm that can generate a set of contig scaffolds
for a draft genome sequence represented in contigs given a reference
genome. The algorithm is aimed at prokaryotic genomes and relies on
the presence of matching sequence patterns between the query and
reference genomes that can be interpreted as the result of large-scale
inversions; we call these patterns inversion signatures.
Our algorithm is capable of correctly generating a scaffold if at
least one member of every inversion signature pair is present in
contigs and no inversion signatures have been overwritten in
evolution. The algorithm is also capable of generating scaffolds in
the presence of any kind of inversion, even though in this general case
there is no guarantee that all scaffolds in the scaffold set will be correct. We
compare the performance of SIS, the program that implements the
algorithm, to seven other scaffold-generating programs. The results of
our tests show that SIS has overall better performance.
[Joint work with Zanoni Dias and Ulisses Dias, UNICAMP.]
Abstract: In the context of studying whole metabolic networks and their interaction with the environment, the following question arises: given a set of target metabolites T and a set of possible external source metabolites S, which are the minimal subsets of S that are able to produce all the metabolites in T ? Such subsets are called the minimal precursor sets of T. The problem is then whether we can enumerate all of them efficiently.
Abstract: We present a constrained version of the problem of enumerating all maximal directed acyclic subgraphs (DAG) of a graph G. In this version, we enumerate maximal DAGs whose sources and targets belong to a predefined subset of the nodes. We call such DAGs stories. We first show how to compute one story in polynomial-time, and then describe two diferent algorithms to tell all possible stories.
Abstract: Network motifs are small subgraphs of networks which are statistically over-represented, suggesting that they play a role in the organisation of the network. In biological applications, the motifs found by the existing methods appear to be highly concentrated in some areas of the networks. I'll give a definition of a local network motif based on this observation and discuss the statistical and algorithmic means to detect them.
Abstract: Given a matrix "S" and a subset of columns "R", we study the problem of finding a cover of "R" with extreme rays of the cone "F={v in R^n | Sv=0, v >= 0}", where an extreme ray "v" covers a column "k" if "v_k > 0". In order to measure how proportional a cover is, we introduce two different minimization problems, namely the "minimum global ratio cover" (MGRC) and the "minimum local ratio cover" (MLRC) problems. In both cases, we apply the notion of the "ratio" of a vector "v", which is given by "max v_i / min v_j". We show that these two problems are NP-hard, even in the case in which "|R| = 1". We introduce a mixed integer programming formulation for the MGRC problem, which is solvable in polynomial time if all columns should be covered, and introduce a branch-and-cut algorithm for the MLRC problem. Finally, we present computational experiments on data obtained from real metabolic networks.
Abstract: Inversions are one of the most frequent large-scale rearrangements
observed in actual genomes. While a large body of literature exists on
mathematical problems related to the computation of the inversion
distance between abstract genomes, these works generally do not take
into account that most inversions in bacterial chromosomes are
symmetric or roughly symmetric with respect to the origin of
replication. We define a new problem: how to sort genomes (or
permutations) using almost-symmetric inversions. We show an algorithm
that can sort any permutation using only almost-symmetric inversions.
Two variants of this algorithm are presented that have better
performance in practice. We explore the question of determining the
minimum number of almostsymmetric inversions needed to sort a genome
by presenting lower and upper bounds and results for special
permutation families. The results obtained are the first steps in
exploring this interesting new problem.
[Joint work with Zanoni Dias, João C. Setubal and Lenwood S. Heath.]
Abstract: We show a new algorithm for computing the diameter of directed unweighted graphs. Even though, in the worst case, this algorithm has complexity O(nm), where n is the number of nodes and m is the number of edges of the graph, we experimentally show that in practice our method works in linear time. Moreover, we show how to extend our algorithm to the case of directed weighted graphs and, even in this case, we present some preliminary very positive experimental results.
Abstract: We present an overview of some computational methods used to treat NGS data, both genomic and transcriptomic. In the first part, we present two data structures currently used in the sequence assembly problem: overlap graph and de Bruijn graph. Together with some complexity results. In the second part, we present a method to extract meaningful information from RNA-seq data (transcriptomic) without assembly.
Chair: Yoshiko Wakabayashi, USP || Combinatorics and
Combinatorial Optimization Research Group || NUMEC
Co-chair: Marie-France Sagot,
Bamboo Team INRIA Rhône-Alpes and
CNRS UMR 5558 LBBE University Lyon 1, France
Local committee: Alexandre da Silva Freire (USP), Cristina Gomes Fernandes (USP), Lourdes Vaz da Silva Netto (NUMEC-USP)
IME-USP |
NUMEC |
Project MaClinC
|
Bamboo
Team
|