INVESTIGADORES
MACCHIAROLI Natalia
artículos
Título:
MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data
Autor/es:
LAURA KAMENETZKY; GEORGINA STEGMAYER; LUCAS MALDONADO; NATALIA MACCHIAROLI; CRISTIAN YONES; DIEGO MILONE
Revista:
GENOMICS
Editorial:
ACADEMIC PRESS INC ELSEVIER SCIENCE
Referencias:
Lugar: Amsterdam; Año: 2016
ISSN:
0888-7543
Resumen:
The cestode parasite Echinococcus multilocularis is the aetiological agent of alveolar echinococcosis, responsiblefor considerable humanmorbidity andmortality. This disease is aworldwide zoonosis ofmajor public health concernand is considered a neglected disease by the World Health Organization. The complete genome ofE. multilocularis has been recently sequenced and assembled in a collaborative effort between the WellcomeTrust Sanger Institute and our group, with the main aim of analyzing protein-coding genes. These analyses suggestedthat approximately 10% of E. multilocularis genome is composed of protein-coding regions. This showsthere is still a vast proportion of the genome that needs to be explored, including non-coding RNAs such assmall RNAs (sRNAs). Within this class of small regulatory RNAs, microRNAs (miRNAs) can be found, whichhave been identified in many different organisms ranging from viruses to higher eukaryotes. MiRNAs are a keyregulation mechanismof gene expression at post-transcriptional level and play important roles in biological processessuch as development, proliferation, cell differentiation and metabolism in animals and plants. In spite ofthis, identification of miRNAs directly from genome-wide data only is still a very challenging task. There aremany miRNAs that remain unidentified due to the lack of either sequence information of particular phylumsor appropriate algorithms to identify novel miRNAs. The motivation for thiswork is the discovery of newmiRNAsin E. multilocularis based on non-target genomic data only, in order to obtain useful information fromthe currentlyavailable unexplored data. In this work, we present the discovery of new pre-miRNAs in the E. multilocularisgenome through a novel approach based on machine learning. We have extracted the most commonly usedstructural features from the folded sequences of the parasite genome: triplets, minimum free energy and sequencelength. These features have been used to train a novel deep architecture of self-organizing maps(SOMs). Thismodel can be trained with a high class imbalance and without the artificial definition of a negativeclass.Wediscovered 886 pre-miRNA candidates within the E.multilocularis genome-wide data. After that, experimentalvalidation by small RNA-seq analysis clearly showed 23 pre-miRNA candidateswith a pattern compatiblewith miRNA biogenesis, indicating themas high confidencemiRNAs.We discovered new pre-miRNA candidatesin E. multilocularis using non-target genomic data only. Predictions were meaningful using only sequence data,with no need of RNA-seq data or target analysis for prediction. Furthermore, the methodology employed canbe easily adapted and applied on any draft genomes, which are actually the most interesting ones since mostnon-model organisms have this kind of status and carry real biological and sanitary relevance.AvailabilityWeb demo: http://fich.unl.edu.ar/sinc/web-demo/mirna-som/Source code: http://sourceforge.net/projects/sourcesinc/files/mirnasom/