SINC(I)   25518
INSTITUTO DE INVESTIGACION EN SEÑALES, SISTEMAS E INTELIGENCIA COMPUTACIONAL
Unidad Ejecutora - UE
artículos
Título:
Genome-wide hairpins datasets of animals and plants for novel miRNA prediction
Autor/es:
BUGNON, L.A.; MILONE, D.H.; YONES, C.; STEGMAYER, G.; RAAD, J.
Revista:
Data in Brief
Editorial:
Elsevier
Referencias:
Lugar: Amsterdam; Año: 2019 vol. 25
ISSN:
2352-3409
Resumen:
This article makes available several genome-wide datasets, which can be used for training mi-croRNA (miRNA) classifiers. The hairpin sequences available are from the genomes of: Homosapiens, Arabidopsis thaliana, Anopheles gambiae, Caenorhabditis elegans and Drosophilamelanogaster. Each dataset provides the genome data divided into sequences and a set of com-puted features for predictions. Each sequence has one label: i) ?positive?: meaning that it is awell-known pre-miRNA, according to miRBase v21 ⁠ 1 ; or ii) ?unlabeled?: indicating that the se-quence has not (yet) a known function and could be a possible candidate to novel pre-miRNA.Due to the fact that selecting an informative feature set is very important for a good pre-miRNAclassifier, a representative feature set with large discriminative power has been calculated andit is provided, as well, for each genome. This feature set contains typical information aboutsequence, topology and structure. Dataset was publically shared in https://sourceforge.net/pro-jects/sourcesinc/files/mirdata/.