INVESTIGADORES
STEGMAYER Georgina Silvia
congresos y reuniones científicas
Título:
Novel microRNA discovery from genome-wide data: a computational pipeline with unsupervised machine learning
Autor/es:
G. STEGMAYER, C. YONES, L. KAMENETZKY, N. MACCHIAROLI, M. PEREZ, M.C. ROSENZVIT, D.H. MILONE
Reunión:
Conferencia; 4th International Society for Computational Biology Latin America Bioinformatics Conference (ISCB-LA); 2016
Institución organizadora:
International Society for Computational Biology (ISCB)
Resumen:
There are several challenges related to the computational prediction of novel microRNAs(miRNAs), especially from genome-wide data and non-model organisms. First of all, manypre-processing steps on the raw data must be done to cut it into sequences, which involvethe selection and use of a variety of software packages written in different programminglanguages, with many different possible configurations and parameters, most of the timeunclear and very difficult to set by the final user. After that, each sequence must be analyzedone by one to classify it as possible candidate to pre-miRNA. The classical way of doing this has been training a binary supervised classifier with well-known pre-miRNAs (for example,extracted from miRBase) and artificially defining the no-pre-miRNA class, which is verydifficult. Thus, a single, complete, and simple procedure for unsupervised pre-miRNAprediction from genome-wide data is of high interest today. We have described a pipeline that, receiving input genome-wide data and a set of wellknownpre-miRNAs of a given organism, can automatically cut the genome into sequences,extract features and train an unsupervised machine learning model for novel pre-miRNAsprediction. It is based on the clustering of unlabelled sequences and well-known miRNAprecursors for the organism under study. Novel pre-miRNAs have been effectivelydiscovered with this methodology, which can help in the design of ?wet? experiments thatotherwise would be impossible to address.