SINC(I)   25518
INSTITUTO DE INVESTIGACION EN SEÑALES, SISTEMAS E INTELIGENCIA COMPUTACIONAL
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
Annotation pipeline for inferring gene functions integrating GO annotations and expression data
Autor/es:
L. DI PERSIA, D.H. MILONE AND G. STEGMAYER
Reunión:
Congreso; X Congreso Argentino de Bioinformática y Biología Computacional; 2019
Institución organizadora:
A2B2C
Resumen:
Background: Computational methods for the prediction of gene function refers to automatically finding associations between a gene and a set of Gene Ontology (GO) terms. Since the hand-made curation process of novel annotations are very time-consuming, computational tools that can reliably predict likely annotations and boost the discovery of new gene annotations are urgently needed. Results: This work proposes a novel pipeline (see Figure) for inferring gene annotations based on the automatic reconstruction of the semantic similarity between genes. The semantic similarity is a metric defined over a set of terms, where the distance between them is based on the likeness of their meaning or semantic content. We benchmarked the proposal against state-of-the-art methods on three published data sets (Arabidopsis thaliana, Saccharomyces cerevisiae and Dictyostelium discoideum). Independent experiments have shown that the proportion between annotated and unannotated genes does not influences the model accuracy. We have used a leave-one-out cross-validation technique. Being the state-of-the-art an average F1 = 15% for related methods, we have achieved a F1 = 30% in average, for all 3 species. It can be stated that our proposal has shown the most balanced results, not missing true GO labels and not assigning, either, a large number of false GO terms to un-annotated genes.