Data: 09/09/2021, às 16:00 horas (GMT-3, horário de Brasília).
Título: Graph theoretical approach for genome comparison with large-scale mutations
Palestrante: Marilia Braga (Universität Bielefeld)

Resumo: A classical problem in comparative genomics, proposed by Sankoff in 1992, is the one of computing the rearrangement distance, which is the minimum number of large-scale mutations required to transform a genome into another, in a setting in which the genes in both genomes are classified into families. Assuming that each family occurs exactly once in each genome, substantial results were developed with the help of graph theoretical models and include the polynomial-time computability of the inversion distance by Hannenhalli and Pevzner in 1995, the introduction of the double-cut and join (DCJ) operation by Yancopoulos, Attie and Friedberg in 2005 and the linear computability of the DCJ distance by Bergeron, Mixtacki and Stoye in 2006. These results rely on the fact that the underlying breakpoint/adjacency graphs are simple collections of paths and cycles.

In this talk I will present research results that I obtained with my colleagues over the last 11 years. First, I will describe an extension of the DCJ model for singular genomes, which are those in which each gene family occurs at most once. Singular genomes may contain distinct genes and, besides DCJ operations, are also subject to insertions and deletions of DNA segments (jointly called indels). We have shown that computing the singular DCJ-indel distance can be done in linear time with the help of the relational diagram, which is also a collection of paths and cycles (results from 2010, with Willing and Stoye). Then I will describe two recent NP-hard extensions of this model: the DCJ-indel distance of natural genomes (from 2020, with Bohnenkämper, Doerr and Stoye) and the weighted DCJ-indel distance of family-free genomes (from 2014/2020, with Martinez, Feijão and Rubert). Both can be solved efficiently with ILP formulations that search for optimal cycle decompositions of a (weighted) multirelational diagram.