SINC(I)   25518
INSTITUTO DE INVESTIGACION EN SEÑALES, SISTEMAS E INTELIGENCIA COMPUTACIONAL
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
Discovery of novel pre-miRNAs: unsupervised versus supervised machine learning
Autor/es:
MILONE, D.H.; STEGMAYER, G.
Reunión:
Conferencia; 4th ISCB-LA Bioinformatics Conference; 2016
Resumen:
The computational prediction of novel microRNAs involves identifying nucleotide sequences having the highest chance of being candidates to miRNA precursors (pre-miRNAs). This is a challenge for a machine learning algorithm because well-known pre-miRNAs are just a fewin comparison to the hundreds of thousands of candidates. This is a high class-imbalance problem. The classical way of approaching it has been training a binary supervised classifier,using well-known pre-miRNAs from miRBase as positive class and artificially defining a negative class. This has two important drawbacks: i) it is extremely difficult to build a representative set of negative examples; and ii) it is well-known in machine learning that high class-imbalance has a strong influence on standard classifiers.In most genomes there is a very high class-imbalance between well-known pre-miRNAs and unlabeled sequences that supervised classification models cannot properly handle. We have presented comparison results in favor of unsupervised machine learning as more suited for pre-miRNA prediction. The comparative results show that unsupervised approaches are capable of maintaining good performance rates, while a supervised model quickly deteriorates, when class imbalance increases. Additionally, the unsupervised approach is more naturally suited to an end user that has good knowledge on the pre-miRNAs of the genome under study, but has no knowledge regarding the definition of a negative class for training a predictor.