Genome Rearrangements
In the mathematical theory of genome rearrangements, genomes
are represented by lists of elements (genes) and the
rearrangements are operations that modify these lists. Typical
examples of such events are reversals and transpositions.
I have been interested in these problems since my first
contact with them on a DIMACS Workshop in 1994. From this date
up to around 2007, several students of mine, including
Maria Emília Machado Telles Walter, Zanoni Dias,
Cleber Valgas Gomes Mira, Vinicius Fortuna, and André Atanasio
Maranhão Almeida worked theoretical aspects of the problem.
After studying the theory of genome rearrangements for some
time, I felt the urge to apply some of the things I learned to
actual genomes. The first step in this direction was to ask my
Masters student Patrícia Pilisson Côgo to use the DCJ distance
to construct a phylogenetic tree of all Vibrio genomes
sequenced to date. This was 2005. There were 6 completely
sequenced vibios in GenBank. Why vibrios? Because
over the years, I established an informal collaboration with
Fabiano Thompson, now at the Federal University of Rio de
Janeiro, and he studies vibrios.
Côgo's contribution included working out several practical
aspects of transforming real genomes into permutations, such as
identifying homologous regions, treating regions without
counterpart in the other genome, and treating duplications. A
poster in the
X-meeting, a poster in ISMB 2006, a
tech
report, and her
dissertation were produced.
Then came another Masters student, Karina Zupo de Oliveira.
Côgo implemented her tests in a series of perl scripts.
Oliveira took them a step further, rewriting the entire pipeline
in Java, using a modern IDE and version control. In the
process, she noticed that the way we were constructing protein
families to assess homology was very time consuming. She
suggested instead that we should use Protein Clusters,
a protein family database and tools available from NCBI. This
reduced the time to add a new species from a month to less than
a day. As a result, she could pairwise compare
ten Vibrio species plus an outgroup (E. coli).
Her
dissertation describes the work done.
Côgo and Oliveira used DCJ as a distance measure after
reducing the genomes to equal content without repeated genes.
Another student of mine, Pedro Feijão, a doctoral student,
started to experiment with something we called
SCJ, a single-cut
counterpart of DCJ. It can also be viewed as a
way of measuring the number of breakpoints between two genomes.
It is a measure for which several hard
problems turn out to have tractable solutions, including small
parsimony. He also wrote lots of Java code to implement SCJ.
Recently, Pedro Feijão discovered a way of
extending
the
algebraic theory to linear genomes, yielding a distance very
close to, but not exactly the same as, DCJ. Like DCJ, this
algebraic distance is easy to compute between two genomes, but
medians may be hard. Algebraic rearrangement theory may be a
tough subject for many, but we love it. Feijão tried hard to
make it palatable to a wider audience in his
PhD thesis, and I think he did a wonderful job.
We are also investigating this same
distance in the more general settings of non-genomic
permutations and of general matrices, with my new students João
Paulo Pereira Zanetti and Priscila Biller.
Biller also run many
experiments with SCJ, to try and
find out its suitability to reconstructing phylogenies, with
good results. She improved on Feijão's initial Java code, and
wrote a manual to help others use the code.
Here is some of our work on the subject. References are in
reverse chronological order:
- Biller, P., Feijão, P. and J. Meidanis.
Rearrangement-based phylogeny using the Single-Cut-or-Join operation.
IEEE/ACM Transactions on Computational Biology and
Bioinformatics, 2012.
Download pdf.
Code,
data, and
companion manual.
- Pedro Feijão.
On genome rearrangement models.
PhD Thesis, presented at
the Institute of Computing, University of Campinas, in October 2012.
Download pdf.
- Feijão, P. and J. Meidanis.
Extending the Algebraic Formalism for Genome Rearrangements to
Include Linear Chromosomes.
IEEE/ACM Transactions on Computational Biology and
Bioinformatics, 2012. An abridged version appeared in the
Proceedings of the 2012 BSB: Brazilian Symposium on
Bioinformatics, Campo Grande, Brazil, August 2012.
Download abridged version,
- Biller, Priscila.
Experimentos em rearranjo de genomas
com a operação Single-Cut-or-Join.
(In Portuguese, with Abstract in English)
Msc Thesis, University of Campinas, March 2010. 101 pp.
Download pdf,
- Feijão, P. and J. Meidanis.
SCJ: A Breakpoint-Like Distance that Simplifies Several
Rearrangement Problems.
IEEE/ACM Transactions on Computational Biology and
Bioinformatics, vol. 8, no. 5, pp. 1318-1329,
Sept.-Oct. 2011,
doi:10.1109/TCBB.2011.34. This is an extended version of
the WABI 2009 paper.
- Oliveira, Karina Zupo de.
Construção de Filogenias Baseadas em Genomas Completos.
(In Portuguese, with Abstract in English)
Msc Thesis, University of Campinas, March 2010. 101 pp.
Download pdf,
- Feijão, P. and J. Meidanis.
SCJ: a variant of breakpoint distance for which sorting, genome
median, and genome halving problems are easy.
Proceedings of the WABI 2009: 9th Workshop on Algorithms in
Bioinformatics, Philadelphia, USA, September 2009.
Download draft
- Côgo, Patrícia Pilisson.
Comparação de Genomas Completos de Espécies da
Família Vibrionacea Empregando Rearranjos de Genomas.
(In Portuguese, with Abstract in English)
Msc Thesis, University of Campinas, March 2008. 59 pp.
Download pdf,
- Côgo, Patrícia Pilisson and J. Meidanis.
A rearrangement-based approach to compare whole genomes of
Vibrionaceae strains.
Technical Report IC-08-05. Institute of Computing, University of
Campinas, 2008.
Download pdf
- Côgo, Patrícia Pilisson and J. Meidanis and F. Thompson.
Comparison Between Complete Genomes of Vibrio Species.
Poster presented at the ISMB: 14th Annual International
Conference on Intelligent Systems for Molecular Biology,
Fortaleza, Brazil, August, 2006.
Download pdf,
- Côgo, Patrícia Pilisson and J. Meidanis.
Comparison Between Complete Genome Sequences
of Vibrionaceae Species.
Poster presented at the X-Meeting: 1st International
Conference of the Brazilian Association for Bioinformatics and
Computational Biology, Caxambu, Brazil, October, 2005.
Download pdf,
- C. V. G. Mira.
Análise Algébrica de Problemas de Rearranjo em Genomas:
Algoritmos e Complexidade.
(In Portuguese)
PhD Thesis, presented at
the Institute of Computing, University of Campinas, in October 2007.
Download pdf
- A. A. M. Almeida.
Comparação Algébrica de Genomas:
o caso da distância de reversão.
(In Portuguese)
MSc Thesis, Institute of Computing, University of Campinas,
April 2007, 125p.
Download pdf
- C. V. G. Mira, J. Meidanis.
A New Data Structure for Genome Rearrangement Problems.
Poster in ISMB/ECCB7. Vienna, Austria, 2007.
Download ps
- C. V. G. Mira, J. Meidanis.
Sorting by Block-Interchanges and Signed Reversals.
Proceedings of the Fourth International Conference on
Information Technology: New Generations (
ITNG'07), p. 670-676. IEEE Computer Society, Los Alamitos,
USA, 2007.
Download pdf
- V. Fortuna.
Distâncias de Transposição entre Genomas.
(In Portuguese)
MSc Thesis, Institute of Computing, University of Campinas,
February 2005, 77p.
Download pdf
- C. V. G. Mira, J. Meidanis.
Analisys of Sorting by Transpositions based on Algebraic Formalism.
Poster in RECOMB'04. San Diego, USA, 2004.
Download presentation
Download poster
- Z. Dias.
Rearranjo de Genomas: Uma Coletânea de Artigos.
(In Portuguese)
PhD Thesis, presented at
the Institute of Computing, University of Campinas, in November
2002.
Download ps (gunzipped)
- J. Meidanis, M. M. T. Walter, and Z. Dias.
A Lower Bound on the Reversal and Transposition Diameter.
Journal
of Computational Biology, Vol.9, No.5, pp. 743-745, 2002.
This is a compact version of a piece of work described below
(*).
- Z. Dias and J. Meidanis.
Sorting by Prefix Transpositions.
Proceedings of SPIRE'2002 - String Processing and
Information Retrieval.
Lecture Notes in Computer Sciences, vol.2476. September,
11-13, 2002. Lisbon, Portugal.
Download ps
View Presentation
- J. Meidanis and Z. Dias.
The Genome Rearrangement Distance by Fusion, Fission, and
Transposition with Arbitrary Weights.
Technical Report IC-02-01, Institute of Computing,
University of Campinas.
Download ps
- J. Meidanis and Z. Dias.
Genome Rearrangements Distance by Fusion, Fission, and
Transposition is Easy.
Proceedings of SPIRE'2001 - String Processing and
Information Retrieval. November,
13-15, 2001. Laguna de San Rafael, Chile.
Summary.
A version appeared as Technical Report IC-01-07, Institute of Computing, University of Campinas.
- M. E. Walter, Z. Dias, and J. Meidanis.
A New Approach for Approximating The Transposition Distance.
Proceedings of SPIRE'2000 - String Processing and
Information Retrieval. September,
27-29, 2000. La Coruna, Spain.
Download ps
- J. Meidanis, Z. Dias.
An Alternative Algebraic Formalism for Genome Rearrangements.
Comparative Genomics, David Sankoff and Joseph Nadeau,
editors, Kluwer Academic Publishers, 2000, pp. 213-223.
Download ps
- M. E. Walter.
Algoritmos para Problemas em Rearranjo de Genomas.
(In Portuguese)
PhD Thesis, presented at
the Institute of Computing, University of Campinas, in December
1999.
Download ps
- (*) M. E. Walter, Z. Dias, and J. Meidanis.
A Lower Bound on the Reversal and Transposition Diameter.
This is a revised and more detailed version of part of the
paper presented in SPIRE'98. It computes the reversal and
transposition distance between a permutation p and the
one obtained from p by reversing each gene. A compact
version was published in 2002 in the Journal of Computational
Biology.
Download pdf
- M. E. Walter, Z. Dias, and J. Meidanis.
Reversal and Transposition Distance of Linear Chromosomes,
Proceedings of SPIRE'98 - String Processing and
Information Retrieval: A South American Symposium. September,
9-11, 1998, pp. 96-102. Santa Cruz de la Sierra, Bolivia.
Download ps
- J. Meidanis, M. E. Walter, Z. Dias.
Transposition Distance Between a Permutation and Its Reverse.
Proceedings of WSP'97 - Fourth South American Workshop
on String Processing. November,
12-13, 1997, pp. 70-79. Valparaiso, Chile.
Download ps (gunzipped)
- J. Meidanis, M. E. Walter, Z. Dias.
Distancia de Reversao de Cromossomos Circulares.
(In Portuguese.)
Proceedings of SEMISH - XXIV Seminario Integrado de Software
e Hardware. August,
2-8, 1997, pp. 119-131. Brasilia, Brazil.
Download ps
Joao Meidanis
Last modified: Tue Jan 8 06:42:04 BRST 2013
by JM