# Genome Rearrangements

In the mathematical theory of genome rearrangements, genomes are represented by lists of elements (genes) and the rearrangements are operations that modify these lists. Typical examples of such events are reversals and transpositions.

I have been interested in these problems since my first contact with them on a Combinatorial Pattern Matching (CPM) conference, in Asilomar, California, USA, in 1994. From this date up to around 2007, several students of mine, including Maria Emília Machado Telles Walter, Zanoni Dias, Cleber Valgas Gomes Mira, Vinicius Fortuna, and André Atanasio Maranhão Almeida worked on theoretical aspects of the problem. They helped me shape an interesting idea, which is to apply the mathematical theory of permutations to rearragement problems, leading to an embrionary form of what we called algebraic rearrangement theory.

After studying the theory of genome rearrangements for some time, I felt the urge to apply some of the things I learned to actual genomes. My Masters students Patrícia Pilisson Côgo and Karina Zupo de Oliveira applied the theory to Vibrio genomes, in an informal collaboration with Fabiano Thompson, then at the University of Campinas, now at the Federal University of Rio de Janeiro, who studies these bacteria.

Côgo's contribution included working out several practical aspects of transforming real genomes into permutations, such as identifying homologous regions, treating regions without counterpart in the other genome, and treating duplications. We used the DCJ distance to construct a phylogenetic tree of the 6 Vibrio genomes completely sequenced up to 2005. Côgo implemented her tests in a series of perl scripts. Oliveira took them a step further, rewriting the entire pipeline in Java, using a modern IDE and version control. In the process, she noticed that the way we were constructing protein families to assess homology was very time consuming. She suggested instead that we should use Protein Clusters, a protein family database and tools available from NCBI. This reduced the time to add a new species from a month to less than a day. As a result, she could pairwise compare ten Vibrio species plus an outgroup (E. coli).

Côgo and Oliveira used DCJ as a distance measure after reducing the genomes to equal content without repeated genes. Another student of mine, Pedro Feijão, a doctoral student, started to experiment with something we called SCJ, a single-cut counterpart of DCJ. It is a measure for which several hard problems turn out to have tractable solutions, including small parsimony. He also wrote lots of Java code to implement SCJ.

In 2011, Pedro Feijão discovered a way of extending the algebraic theory to linear genomes, yielding a distance very close to, but not exactly the same as, DCJ. Like DCJ, this algebraic distance is easy to compute between two genomes, but medians, which are hard for DCJ, are easy for the algebraic distance, if one is willing to dive into orthogonal matrices (see below). Algebraic rearrangement theory may be a tough subject for many, but we love it. Feijão tried hard to make it palatable to a wider audience in his PhD thesis, and I think he did a wonderful job. We also investigated this same distance in the more general settings of non-genomic permutations and of general matrices, with my students João Paulo Pereira Zanetti and Priscila Biller.

Biller also run many experiments with SCJ, to try and find out its suitability to reconstructing phylogenies, with good results. She improved on Feijão's initial Java code, and wrote a manual to help others use the code.

In 2013, Sophia Yancopoulos kindly invited to me co-author a contribution to the MAGE workshop in honor of David Sankoff. The text explores the DCJ and algebraic distances, trying to uncover the reasons for their similarities and differences.

The algebraic distance is an alternative to DCJ, since both are easy to compute and model many of the rearrangement events observed in genome evolution. Our interest turned to finding genomes that are close to three given genomes, an important step in phylogeny reconstruction. Besides the traditional medians, which minimize the sum of the algebraic distances, we are experimenting with minimax genomes, which minimize the maximum of the distances.

The algebraic distance has also an interesting relationship with matrix theory. Genomes can be seen as matchings of gene extremities. We can code such a matching as a matrix as follows. Consider the adjacency matrix of this matching, but add 1's in the diagonal for the unmatched vertices. Voilá! You have a genome matrix. It turns out that the algebraic distance between two genomes is exactly half of the rank distance between the corresponding matrices.

Looking at genomes as matrices has an added benefit: one can solve the median problem in polynomial time. However, the solution is not always "genomic".

Here is some of our work on the subject, in reverse chronological order:

## Talks

• J. Meidanis. Counting scenarios, intermediate genomes, and dealing with missing genes with the rank distance.
Presented at the Comparative Genomics in Campo Grande Workshop. Campo Grande, Brazil, in Decdember 2019.
• J. Meidanis. Rank Distance Sheds Light on Genome Evolution.
Presented at the 22nd Conference of the International Linear Algebra Society (ILAS 2019). Rio de Janeiro, Brazil, in July 2019.
• J. Meidanis. Genome Matrices and The Median Problem.
Presented at the Department of Biochemistry, University of Sao Paulo, Brazil, in June 2019.
• J. Meidanis. Counting Sorting Scenarios and Intermediate Genomes for the Rank Distance.
Presented at the 6th International Conference on Algorithms for Computational Biology, Berkeley, CA, USA, in May 2019.
• J. Meidanis. Genome Matrices and The Median Problem.
Presented at the Brazilian Symposium on Bioinformatics, Niteroi, Brazil, in October 2018.
• J. Meidanis. Genome Matrices and The Median Problem.
Presented at the Department of Mathematics, Universidade Federal Fluminense, Brazil, in December 2017.
• J. Meidanis. Genome Matrices and The Median Problem.
Presented at the II Symposium on Biological Sciences, Ouro Preto, Brazil, in November 2017.
• J. Meidanis. Genome Matrices and Applications.
15-minute recruiting presentation at the Institute of Computing, University of Campinas, Brazil, in August 2017.
• J. Meidanis. Genome Matrices and The Median Problem.
Presented at the Department of Computer Science and Operations Research, Université de Montréal, Canada, in December 2016.
Presented at the Cheriton School of Computer Science, University of Waterloo, Canada, and
at the School of Computer Science, University of Windsor, Canada, and
at the Discrete Math Seminar Series, Simon Fraser University, Canada, in May 2017.
Presented at the Institute of Computing, University of Campinas, Brazil, in July 2017.

## Publications

• D. Sankoff, C. Zheng, Y. Zhang, J. Meidanis, E. Lyons, H. Tang. Models for Similarity Distributions of Syntenic Homologs and Applications to Phylogenomics. IEEE/ACM Transactions on Computational Biology and Bioinformatics 16(3) 727-737 (2019). 10.1109/TCBB.2018.2849377
• J.P.P. Zanetti, L. Chindelevitch, J. Meidanis. Generalizations of the Genomic Rank Distance to Indels (new title), or Rank distance generalizations for genomes with indels (old title). AlCoB: 6th International Conference on Algorithms for Computational Biology, Berkeley CA, USA, May 2019. Download draft Appendix
• J.P.P. Zanetti, L. Chindelevitch, J. Meidanis. Counting Sorting Scenarios and Intermediate Genomes for the Rank Distance. AlCoB: 6th International Conference on Algorithms for Computational Biology, Berkeley CA, USA, May 2019. Download draft Appendix
• L. Chindelevitch, J. Meidanis. A cubic algorithm for the generalized rank median of three genomes. RECOMB-CG: 16th Annual Satellite Workshop on Comparative Genomics, Sherbrooke, Canada, October 2018. Journal version: Chindelevitch, L., La, S. & Meidanis, J. A cubic algorithm for the generalized rank median of three genomes. Algorithms Mol Biol 14, 16 (2019). This article presents the MI algorithm for genome medians. 10.1186/s13015-019-0150-y
• L. Chindelevitch, J. Meidanis. On the Rank-Distance Median of 3 Permutations. RECOMB-CG: 15th Annual Satellite Workshop on Comparative Genomics, Barcelona, Spain, October 2017. The journal version of this paper, which appeared in BMC Bioinformatics 2018 19 (Suppl6):142 with additional author J.P.P. Zanetti, presents the orthogonal algorithm for genome medians. Download draft
• J. Meidanis, P. Biller, J.P.P. Zanetti. A Matrix-Based Theory for Genome Rearrangements. Technical Report, Institute of Computing, University of Campinas, 2017 Download pdf
• J.P.P. Zanetti, P. Biller, and J. Meidanis. Median Approximations for Genomes Modeled as Matrices. Bull Math Biol (2016) 78: 786. doi:10.1007/s11538-016-0162-4. Simulated data: add/remove adjacencies up to 30%. Simulated data: DCJ operation up to 30%. Real data.
• J.P.P. Zanetti, P. Biller, and J. Meidanis. On the Matrix Median Problem. Proceedings of the WABI 2013: 13th Workshop on Algorithms in Bioinformatics, Sophia Antipolis, France, September 2013. Download draft
• P. Feijao, J. Meidanis. The Algebraic Theory for Genome Rearrangements. Poster presented at WABI 2013: 13th Workshop on Algorithms in Bioinformatics, Sophia Antipolis, France, September 2013. Download pdf
• J. Meidanis, and S. Yancopoulos. The emperor has no caps! A Comparison of DCJ and Algebraic Distances. Proceedings of MAGE: Models and Algorithms for Genome Evolution, Montreal, Canada, August 2013. Download preliminary text
• J. P. P. Zanetti, P. Biller, J. Meidanis. On the matrix median problem. Poster presented at MAGE: Models and Algorithms for Genome Evolution, Montreal, Canada, August 2013. Download pdf
• P. Biller, P. Feijão, and J. Meidanis. Rearrangement-based phylogeny using the Single-Cut-or-Join operation. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2013. Download pdf. Code, data, and companion manual.
• J. Meidanis, P. Feijao, P. Biller, J. P. P. Zanetti. On the algebraic genome median. Poster presented at RECOMB-CG: 10th Annual Satellite Workshop on Comparative Genomics, Niteroi, Brazil, October, 2012. Download pdf
• P. Feijão. On genome rearrangement models. PhD Thesis, presented at the Institute of Computing, University of Campinas, in October 2012. Download pdf
• P. Feijão, J. Meidanis. Extending the Algebraic Formalism for Genome Rearrangements to Include Linear Chromosomes. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 10(4), 819-831, 2012. An abridged version appeared in the Proceedings of the 2012 BSB: Brazilian Symposium on Bioinformatics, Campo Grande, Brazil, August 2012. Download abridged version
• P. Biller. Experimentos em rearranjo de genomas com a operação Single-Cut-or-Join. (In Portuguese, with Abstract in English) Msc Thesis, University of Campinas, March 2010. 101 pp. Download pdf
• P. Feijão, J. Meidanis. SCJ: A Breakpoint-Like Distance that Simplifies Several Rearrangement Problems. IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. 5, pp. 1318-1329, Sept.-Oct. 2011, doi:10.1109/TCBB.2011.34. This is an extended version of the WABI 2009 paper.
• K. Z. de Oliveira. Construção de Filogenias Baseadas em Genomas Completos. (Phylogeny Construction based on Complete Genomes), in Portuguese, with Abstract in English) Msc Thesis, University of Campinas, March 2010. 101 pp. Download pdf Supplementary material
• P. Feijão, J. Meidanis. SCJ: a variant of breakpoint distance for which sorting, genome median, and genome halving problems are easy. Proceedings of the WABI 2009: 9th Workshop on Algorithms in Bioinformatics, Philadelphia, USA, September 2009. Download draft
• P. P. Côgo. Comparação de Genomas Completos de Espécies da Família Vibrionacea Empregando Rearranjos de Genomas. (In Portuguese, with Abstract in English) Msc Thesis, University of Campinas, March 2008. 59 pp. Download pdf
• P. P. Côgo, J. Meidanis. A rearrangement-based approach to compare whole genomes of Vibrionaceae strains. Technical Report IC-08-05. Institute of Computing, University of Campinas, 2008. Download pdf
• P. P. Côgo, J. Meidanis, F. Thompson. Comparison Between Complete Genomes of Vibrio Species. Poster presented at the ISMB: 14th Annual International Conference on Intelligent Systems for Molecular Biology, Fortaleza, Brazil, August, 2006. Download pdf
• P. P. Côgo, J. Meidanis. Comparison Between Complete Genome Sequences of Vibrionaceae Species. Poster presented at the X-Meeting: 1st International Conference of the Brazilian Association for Bioinformatics and Computational Biology, Caxambu, Brazil, October, 2005. Download pdf
• C. V. G. Mira. Análise Algébrica de Problemas de Rearranjo em Genomas: Algoritmos e Complexidade. (In Portuguese) PhD Thesis, presented at the Institute of Computing, University of Campinas, in October 2007. Download pdf
• A. A. M. Almeida. Comparação Algébrica de Genomas: o caso da distância de reversão. (In Portuguese) MSc Thesis, Institute of Computing, University of Campinas, April 2007, 125p. Download pdf
• C. V. G. Mira, J. Meidanis. A New Data Structure for Genome Rearrangement Problems. Poster in ISMB/ECCB7. Vienna, Austria, 2007. Download pdf
• C. V. G. Mira, J. Meidanis. Sorting by Block-Interchanges and Signed Reversals. Proceedings of the Fourth International Conference on Information Technology: New Generations ( ITNG'07), p. 670-676. IEEE Computer Society, Los Alamitos, USA, 2007. Download pdf
• V. Fortuna. Distâncias de Transposição entre Genomas. (In Portuguese) MSc Thesis, Institute of Computing, University of Campinas, February 2005, 77p. Download pdf
• C. V. G. Mira, J. Meidanis. Analisys of Sorting by Transpositions based on Algebraic Formalism. Poster in RECOMB'04. San Diego, USA, 2004. Download presentation Download poster
• Z. Dias. Rearranjo de Genomas: Uma Coletânea de Artigos. (In Portuguese) PhD Thesis, presented at the Institute of Computing, University of Campinas, in November 2002. Download pdf
• J. Meidanis, M. E. M. T. Walter, Z. Dias. A Lower Bound on the Reversal and Transposition Diameter. Journal of Computational Biology, Vol.9, No.5, pp. 743-745, 2002. This is a compact version of a piece of work described below (*).
• Z. Dias, J. Meidanis. Sorting by Prefix Transpositions. Proceedings of SPIRE'2002 - String Processing and Information Retrieval. Lecture Notes in Computer Sciences, vol.2476. September, 11-13, 2002. Lisbon, Portugal. Download pdf View Presentation
• J. Meidanis, Z. Dias. The Genome Rearrangement Distance by Fusion, Fission, and Transposition with Arbitrary Weights. Technical Report IC-02-01, Institute of Computing, University of Campinas. Download pdf
• Z. Dias, J. Meidanis. Genome Rearrangements Distance by Fusion, Fission, and Transposition is Easy. Proceedings of SPIRE'2001 - String Processing and Information Retrieval. November, 13-15, 2001. Laguna de San Rafael, Chile. Summary. A version appeared as Technical Report IC-01-07, Institute of Computing, University of Campinas.
• M. E. M. T. Walter, Z. Dias, J. Meidanis. A New Approach for Approximating The Transposition Distance. Proceedings of SPIRE'2000 - String Processing and Information Retrieval. September, 27-29, 2000. La Coruna, Spain. Download pdf
• J. Meidanis, Z. Dias. An Alternative Algebraic Formalism for Genome Rearrangements. Comparative Genomics, David Sankoff and Joseph Nadeau, editors, Kluwer Academic Publishers, 2000, pp. 213-223. Download pdf
• M. E. M. T. Walter. Algoritmos para Problemas em Rearranjo de Genomas. (In Portuguese) PhD Thesis, presented at the Institute of Computing, University of Campinas, in December 1999. Download pdf
• (*) M. E. M. T. Walter, Z. Dias, J. Meidanis. A Lower Bound on the Reversal and Transposition Diameter. This is a revised and more detailed version of part of the paper presented in SPIRE'98. It computes the reversal and transposition distance between a permutation p and the one obtained from p by reversing each gene. A compact version was published in 2002 in the Journal of Computational Biology. Download pdf
• M. E. M. T. Walter, Z. Dias, J. Meidanis. Reversal and Transposition Distance of Linear Chromosomes, Proceedings of SPIRE'98 - String Processing and Information Retrieval: A South American Symposium. September, 9-11, 1998, pp. 96-102. Santa Cruz de la Sierra, Bolivia. Download pdf
• J. Meidanis, M. E. M. T. Walter, Z. Dias. Transposition Distance Between a Permutation and Its Reverse. Proceedings of WSP'97 - Fourth South American Workshop on String Processing. November, 12-13, 1997, pp. 70-79. Valparaiso, Chile. Download pdf
• J. Meidanis, M. E. M. T. Walter, Z. Dias. Distancia de Reversao de Cromossomos Circulares. (In Portuguese.) Proceedings of SEMISH - XXIV Seminario Integrado de Software e Hardware. August, 2-8, 1997, pp. 119-131. Brasilia, Brazil. Download pdf

Joao Meidanis