INVESTIGADORES
BUGNON Leandro Ariel
artículos
Título:
Predicting novel microRNA: a comprehensive comparison of machine learning approaches
Autor/es:
GEORGINA STEGMAYER; LEANDRO DI PERSIA; MARIANO RUBIOLO; MATÍAS GERARD; MILTON PIVIDORI; CRISTIAN YONES; LEANDRO BUGNON; TADEO RODRIGUEZ; JONATAN RAAD; DIEGO H. MILONE
Revista:
BRIEFINGS IN BIOINFORMATICS
Editorial:
OXFORD UNIV PRESS
Referencias:
Año: 2018
ISSN:
1467-5463
Resumen:
Motivation: The importance of microRNAs (miRNAs) is widely recognized in the community nowadaysbecause these short segments of RNA can play several roles in almost all biological processes. Thecomputational prediction of novel miRNAs involves training a classifier for identifying sequences havingthe highest chance of being miRNA precursors (pre-miRNAs). The big issue with this task is that well-knownpre-miRNAs are usually very few in comparison to the hundreds of thousands of candidates sequencesin a genome, which results in high class imbalance. This imbalance has a strong influence on moststandard classifiers, and if not properly addressed in the model and the experiments, not only performancereported can be completely unrealistic, but also the classifier will not be able to work properly for premiRNAprediction. Besides, another important issue is that for most of the machine learning approachesalready employed (supervised methods) it is necessary to have both positive and negative examples. Theselection of positive examples is straightforward (well-known pre-miRNAs). However, it is very difficult tobuild a representative set of negative examples because they should be sequences with hairpin structurethat do not actually contain a pre-miRNA.Results: This review provides a comprehensive study and comparative assessment of methods from thesetwo machine learning (ML) approaches for dealing with the prediction of novel pre-miRNAs: supervised andunsupervised training. We present and analyze the machine learning proposals that have appeared duringthe last 10 years in literature. They have been compared in several prediction tasks involving two modelgenomes and increasing imbalance levels. This work provides a review of existing ML approaches for premiRNAprediction and fair comparisons of the classifiers with same features and data sets, instead of justa revision of published software tools. The results and the discussion can help the community to select themost adequate bioinformatics approach according to the prediction task at hand. The comparative resultsobtained suggest that from low to mid imbalance levels between classes, supervised methods can be thebest. However, at very high imbalance levels, closer to real case scenarios, models including unsupervisedand deep learning can provide better performance.Availability: http://sourceforge.net/projects/sourcesinc/files/ml4mirna/Contact: gstegmayer@sinc.unl.edu.arSupplementary information: Supplementary data are available at Briefings in Bioinformatics online.