SINC(I)   25518
INSTITUTO DE INVESTIGACION EN SEÑALES, SISTEMAS E INTELIGENCIA COMPUTACIONAL
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
Improving pre-miRNA prediction with complexity measures of the mature and deep learning
Autor/es:
J. RAAD, G. STEGMAYER, D.H. MILONE
Lugar:
Mendoza
Reunión:
Congreso; X Congreso Argentino de Bioinformática y Biología Computacional; 2019
Institución organizadora:
A2B2C
Resumen:
The miRNAs are small RNAmolecules that regulate gene expression in animal and plant cellsthrough post-transcriptional control. They are stored inside precursors of 100 bases long approximately called pre-miRNAs, which have a stem-loop structure. Several experimental methods for detecting pre-miRNAs can be used , such as qPCR, microarray and deep sequencing. However, these techniques present some practical difficulties when evaluating a very large number of candidate sequences in a genome. Due to these technical and practical difficulties, computational methods play an increasingly important role for their prediction . In order to find new candidates forpre-miRNA, many different features sets have been proposed, which mostly describe information of the structure of the pre-miRNA inspired by the action of Drosha. However, the specificity of the subsequent processes impose restrictions on those hairpins that will become mature miRNA. Given that this important information is codified in the mature region, the secondary structure of the precursor by itself might not be sufficient to differentiate a true pre-miRNA from other hairpins. In this work, we have developed a new feature for the mature sequences representation based on theLevenshtein distance, which is a string metric for measuring the edit difference between two sequences.Furthermore, this new feature combined with deep learning, has proven to be able to improvesignificantly the prediction of pre-miRNAs. We have developed a deep neural network (DNN) and train itwith and without the new feature using cross validation, and the results obtained indicate that this modelwas able to improve the separation of classes even in the presence of very high imbalance in the data.Figure 2 shows the classification results (sensibility, precision and F 1 ) for the new proposed feature andthe standard features, with DNN as classifier. The Figure clearly shows how the DNN classifier iscapable of maintaining high performance at increasing imbalances, and even increasing both sensibilityand precision when the new feature is used. Moreover, it is observed that F 1 is significantly higher for allthe imbalances when the new feature is used, increasing from 30% to 80% compared to standard featuresat the highest imbalance.