INVESTIGADORES
MILONE Diego Humberto
congresos y reuniones científicas
Título:
Integrating multiple information sources for protein function prediction with end-to-end deep learning
Autor/es:
GABRIELA MERINO, DIEGO MILONE, MARIA MARTIN, GEORGINA STEGMAYER AND RABIE SAIDI
Reunión:
Conferencia; Intelligent Systems for Molecular Biology and European Conference on Computational Biology 2021; 2021
Resumen:
Manual curation based on experimental evidence is a precise strategy for function annotation but extremely expensive and time-consuming. Hence it cannot cope with the exponential growth of data. Although computational methods for function prediction are being constantly developed, their performance is still subject to improvement, especially for no-knowledge (NK) proteins.We propose a novel end-to-end deep learning model for predicting Gene Ontology (GO) terms by integrating multiple features from sequence and taxon of proteins. Our model was trained and evaluated as the CAFA3 challenge. For training, NK proteins were augmented using CAFA3 training proteins with no changes or added annotations up to 02/2017 (challenge deadline). For evaluation, CAFA3 benchmark proteins were used obtaining F-max scores of 0.34, 0.55, and 0.55 for biological process (BP), cellular component (CC) and molecular function (MF), respectively. These results revealed our model performed in the top 5 CAFA3 methods, achieving very competitive scores to those of the best competitors for BP and CC. It is also the second-best method when predicting MF. Our results suggest deep learning integrating multi-source data and using data augmentation during training is a promising tool for function prediction.