Genome Rearrangements

In the mathematical theory of genome rearrangements, genomes are represented by lists of elements (genes) and the rearrangements are operations that act on those lists. Typical examples of such events are reversals and transpositions.

I have been interested in these problems since my first contact with them on a DIMACS Workshop in 1994. My students Maria Emília Machado Telles Walter, Zanoni Dias, Cleber Valgas Gomes Mira, Vinicius Fortuna, and André Atanasio Maranhão Almeida worked on the problem.

After studying the theory of genome rearrangements for some time, I felt the urge to apply some of the things I learned to actual genomes. The first step in this direction was to ask my Masters student Patrícia Pilisson Côgo to use the DCJ distance to construct a phylogenetic tree of all Vibrio genomes sequenced to date. This was 2005. There were 6 completely sequenced vibios in GenBank. Why vibrios? Because Over the years, I established an informal collaboration with Fabiano Thompson, now at the Federal University of Rio de Janeiro, and he studies vibrios.

Côgo did a wonderful job. She gave her contribution on how to transform genomes into permutations, including identifying homologous regions, treating regions without counterpart in the other genome, and treating duplications. A poster in the X-meeting, a poster in ISMB 2006, a tech report, and her dissertation were produced.

Then came another Masters student, Karina Zupo de Oliveira. Côgo implemented her tests in a series of perl scripts. Oliveira took them a step further, rewriting the entire pipeline in Java, using a modern IDE and version control. In the process, she noticed that the way we were constructing protein families to assess homology was very time consuming. She suggested instead that we should use Protein Clusters, a protein family database and tools available from NCBI. This reduced the time to add a new species from a month to less than a day. As a result, she could pairwise compare ten Vibrio species plus an outgroup (E. coli). Her dissertation describes the work done.

Côgo and Oliveira used DCJ as a distance measure after reducing the genomes to equal content without repeated genes. Another student of mine, Pedro Feijão, a doctoral student, started to experiment with something we called SCJ, a single-cut counterpart of DCJ. It is a measure for which several hard problems turn out to have tractable solutions, including small parsimony. He also wrote code in Java with IDE and version control. So I thought, it's time to put all this code together and build something that can evolve towards a Genome Rearrangement Workbench.

Priscila Biller run many experiments with SCJ, to try and find out its suitability to reconstructing phylogenies, with good results.

Recently, Pedro Feijão discovered a way of extending the algebraic theory this group invented in 2000 to linear genomes. We are very exceited with this new development, and think that it may help extend current reconstruction capabilities to thousands of large genomes.

Here is some of our work on the subject. References are in reverse chronological order:


Joao Meidanis
Last modified: Wed May 16 14:48:23 BRT 2012 by JM