SINC(I)   25518
INSTITUTO DE INVESTIGACION EN SEÑALES, SISTEMAS E INTELIGENCIA COMPUTACIONAL
Unidad Ejecutora - UE
artículos
Título:
Genome-wide pre-miRNA discovery from few labeled examples
Autor/es:
MILONE, D. H.; STEGMAYER, G.; YONES, C.
Revista:
BIOINFORMATICS (OXFORD, ENGLAND)
Editorial:
OXFORD UNIV PRESS
Referencias:
Lugar: Oxford; Año: 2018 vol. 34 p. 541 - 549
ISSN:
1367-4803
Resumen:
Motivation: Although many machine learning techniques have been proposed to distinguish miRNA hairpins from other stem-loop sequences in genome-wide data, most of current methods use supervised learning, which requires a very good set of positive and negative examples. There are many practical limitations when those methods have to be applied in a real prediction task. First, the challenge of dealing with a scarce number of positive well-known pre-miRNA examples. Second, it is very difficult to build a good set of negative examples to represent the full spectrum of non-miRNA sequences. Third, in any genome there is a huge class imbalance (1:10000) that is well-known to particularly affect supervised classifiers.Results: To enable efficient and speedy genome-wide predictions of novel microRNAs, we present miRNAss, a novel method based on semi-supervised learning. It takes advantage of the information provided by the unlabeled stem-loops improving the prediction rates even when the number of labelled examples is low and not representative of the classes. An automatic method for searching negative examples to initialize the algorithm is also proposed to free the user from this difficult task. MiRNAss achieves better prediction rates and shorter execution times than state-of-the-art supervised methods. It has been validated with 1,700,000 hairpin sequences of the whole genome of a model species, achievingbetter prediction rates than other methods and demonstrating its applicability in a real prediction task.