Unsupervised learning of snake venom proteomes yielded by mass spectrometry-based assays
Snake venoms are complex mixtures composed mostly of proteins and peptides. One analytical strategy that is often used to study them is the mass-spectrometry (MS)-based proteomics. Recent analysis using proteomic data gathered through MS assays showed a correlation between a clustering of proteins from different snake species of the
Bothrops genus and their phylogenetic classification (
Andrade-Silva et al., 2016). However, the overrepresentation of some species in the protein sequence database used for the computational identification of peptides could incur in a bias in such type of analysis. To mitigate that problem, more recently it was generated phyloproteomic trees based on a
de novo sequencing, that is, a peptide identification without the usage of a database. Nonetheless, that approach generated many false positives, which also are a source of noise in those analyses. Therefore, we are trying to circumvent those issues through the direct usage of MS raw data for the estimation of phyloproteomic trees. To this end, we are using a multiresolution-based approach for the construction of matrices as a function of fragment retention time, mass/charge ratio and intensity detection. Such matrices are being used to generate phyloproteomic trees through Unsupervised Learning approaches, for instance, through agglomerative clusterings. Finally, we are applying the CADM statistical test for the comparison of the cladograms yielded by our methodology with the one obtained using mitochondrial DNA. With this new methodology fully developed, we expect more robust results regarding to which extent the venom proteomic profiles matches the phylogeny of different species of
Bothrops snakes, an assessment that is critical in evolutionary studies of those organisms.