INSTITUTO DE INVESTIGACIONES EN MICROBIOLOGIA Y PARASITOLOGIA MEDICA
Unidad Ejecutora - UE
MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data
LAURA KAMENETZKY; NATALIA MACCHIAROLI; LUCAS MALDONADO; DIEGO MILONE; GEORGINA STEGMAYER; CRISTIAN YONES
ACADEMIC PRESS INC ELSEVIER SCIENCE
Lugar: Amsterdam; Año: 2016 vol. 107 p. 274 - 274
The cestode parasite Echinococcus multilocularis is the aetiological agent of alveolar echinococcosis, responsible for considerable human morbidity and mortality. This disease is a worldwide zoonosis of major public health concern and is considered a neglected disease by the World Health Organization. The complete genome of E. multilocularis has been recently sequenced and assembled in a collaborative effort between the Wellcome Trust Sanger Institute and our group, with the main aim of analyzing protein-coding genes. These analyses suggested that approximately 10% of E. multilocularis genome is composed of protein-coding regions. This shows there is still a vast proportion of the genome that needs to be explored, including non-coding RNAs such as small RNAs (sRNAs). Within this class of small regulatory RNAs, microRNAs (miRNAs) can be found, which have been identified in many different organisms ranging from viruses to higher eukaryotes. MiRNAs are a key regulation mechanism of gene expression at post-transcriptional level and play important roles in biological processes such as development, proliferation, cell differentiation and metabolism in animals and plants. In spite of this, identification of miRNAs directly from genome-wide data only is still a very challenging task. There are many miRNAs that remain unidentified due to the lack of either sequence information of particular phylums or appropriate algorithms to identify novel miRNAs. The motivation for this work is the discovery of new miRNAs in E. multilocularis based on non-target genomic data only, in order to obtain useful information from the currently available unexplored data. In this work, we present the discovery of new pre-miRNAs in the E. multilocularis genome through a novel approach based on machine learning. We have extracted the most commonly used structural features from the folded sequences of the parasite genome: triplets, minimum free energy and sequence length. These features have been used to train a novel deep architecture of self-organizing maps (SOMs). This model can be trained with a high class imbalance and without the artificial definition of a negative class. We discovered 886 pre-miRNA candidates within the E. multilocularis genome-wide data. After that, experimental validation by small RNA-seq analysis clearly showed 23 pre-miRNA candidates with a pattern compatible with miRNA biogenesis, indicating them as high confidence miRNAs. We discovered new pre-miRNA candidates in E. multilocularis using non-target genomic data only. Predictions were meaningful using only sequence data, with no need of RNA-seq data or target analysis for prediction. Furthermore, the methodology employed can be easily adapted and applied on any draft genomes, which are actually the most interesting ones since most non-model organisms have this kind of status and carry real biological and sanitary relevance.