INVESTIGADORES
MILONE Diego Humberto
congresos y reuniones científicas
Título:
miRNAss: a semi-supervised approach for microRNA prediction
Autor/es:
YONES, C.A.; STEGMAYER, G.; MILONE, D.H.
Lugar:
Bahía Blanca
Reunión:
Conferencia; VI Argentinian Conference on Bioinformatics and Computational Biology; 2015
Resumen:
MicroRNA (miRNAs) play essential roles in post-transcriptional gene regulation in animals and plants. Precursors of miRNA (pre-miRNA) are characterized by their hairpins structure. However, a large amount of similar sequences can be folded into this kind of structure. Several existing computational approaches have been developed to predict which hairpins can be pre-miRNAs, but they require a sufficient number of known pre-miRNAs and non pre-miRNAs as learning samples. However, most sequenced genomes have a very small number of miRNAs reported and most of the sequences are unlabeled. The semi-supervised approach proposed in this work takes advantage of these sequences to achieve better prediction rates than state-of-the- art methods. The first step is to build a similarity matrix among the sequences using the euclidean distance between their feature vectors. Then, a vector of labels y is defined, having a positive value for known miRNAs, negative for non-miRNAs and zero for unlabeled sequences. Thus, the scores z to assign a class to the unlabeled sequences is obtained solving a optimization problem. To test the prediction power of miRNAss, we have compared it with a similar approach that uses few training examples. miRNAss has outperformed it in most cases in the same experiments with human data. Furthermore, to test miRNAss predictivity in other species, the plant datasets provided by Gudys et al. have been used. We have presented a new miRNA prediction method called miRNAss. It uses a semi-supervised approach to face the problem of very few training samples within complete genomes. The experiments showed that miRNAss can effectively achieve better results than state-of-the-art methods with very few training samples and that it is versatile enough to be used in genomes of several species.