SINC(I)   25518
INSTITUTO DE INVESTIGACION EN SEÑALES, SISTEMAS E INTELIGENCIA COMPUTACIONAL
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
Non-negative matrix factorization for prediction of gene annotations
Autor/es:
LEALE, G.; DI PERSIA, L. E.; MILONE, D.H.; STEGMAYER, G.
Reunión:
Conferencia; 4th ISCB-LA Bioinformatics Conference; 2016
Resumen:
The accurate prediction of gene annotations is currently an important issue in modern computational biology. A list of putative terms/labels can be provided by the Gene Ontology(GO) and used to design targeted biological experiments in order to generate novel and validated knowledge. However, the handmade curation process of novel annotations is very time-consuming and costly. Thus novel computational tools are needed to reliably predict likely annotations and quicken the discovery of new gene functions. The proximity between GO terms (semantic similarity) can be measured through any of existing semantic measures available, in order to build a distance matrix of GO annotations (dGO) between a group of genes of interest. However, for the case of novel or non-annotated genes, this matrix will have many empty positions. Thus their similarity to annotated genes in order to infer semantically closed annotations could not be calculated. We will show how it is possible to fully reconstruct dGO by using other available information source for the genes (such as expression levels), and afterwards infer their GO labels. We have presented a novel approach to the of prediction of gene annotations based on NNMF fusion of the semantic and expression distances among genes, that uses the fusion to complete the unknown part of the semantic distance matrix. The reconstructed semantic matrix can be then used to infer candidates terms for the unknown genes. This approach can yield a sensitivity and precision comparable and extremely close to the one obtained by using the real semantic distance information, which shows that the NMF fusion approach was successful in capturing the information structure of the dGO matrix.