INVESTIGADORES
MILONE Diego Humberto
congresos y reuniones científicas
Título:
Novel microRNA discovery from genome-wide data: a computational pipeline with unsupervised machine learning
Autor/es:
STEGMAYER, G.; YONES, C.A.; KAMENETZKY, L.; MACCHIAROLI, N.; PEREZ, M.; ROSENZVIT, M.C.; MILONE, D.H.
Reunión:
Conferencia; 4th ISCB-LA Bioinformatics Conference; 2016
Resumen:
There are several challenges related to the computational prediction of novel microRNAs (miRNAs), especially from genome-wide data and non-model organisms. First of all, many pre-processing steps on the raw data must be done to cut it into sequences, which involve the selection and use of a variety of software packages written in different programming languages, with many different possible configurations and parameters, most of the time unclear and very difficult to set by the final user. After that, each sequence must be analyze done by one to classify it as possible candidate to pre-miRNA. The classical way of doing this has been training a binary supervised classifier with well-known pre-miRNAs (for example,extracted from miRBase) and artificially defining the no-pre-miRNA class, which is very difficult. Thus, a single, complete, and simple procedure for unsupervised pre-miRNA prediction from genome-wide data is of high interest today. We have described a pipeline that, receiving input genome-wide data and a set of well known pre-miRNAs of a given organism, can automatically cut the genome into sequences,extract features and train an unsupervised machine learning model for novel pre-miRNAs prediction. It is based on the clustering of unlabelled sequences and well-known miRNA precursors for the organism under study. Novel pre-miRNAs have been effectively discovered with this methodology, which can help in the design of wet experiments that otherwise would be impossible to address.