INVESTIGADORES
MILONE Diego Humberto
congresos y reuniones científicas
Título:
Predicting protein functions with deep learning and multi-source data
Autor/es:
GABRIELA MERINO, RABIE SAIDI, DIEGO MILONE, GEORGINA STEGMAYER AND MARÍA MARTIN
Reunión:
Conferencia; Intelligent Systems for Molecular Biology and European Conference on Computational Biology 2020; 2020
Resumen:
Identifying protein functions is crucial in molecular biology. Experimental and manually curation are extremely time-consuming and expensive, and hence it cannot cope with the exponential increase of data. Thus, computational methods for automatic function prediction are needed. Although such methods are being constantly developed, their performance is still subject for improvement.We propose novel deep learning models for predicting Gene Ontology (GO) annotations integrating multi-source data, represented as protein association matrices. Our models were trained and evaluated on yeast. Input association matrices were based on sequence distance, transcriptomics experiments, GO terms and Reactome annotations. The F-max, commonly-used in CAFA challenges, was used for evaluating molecular function (MF), cellular component (CC), and biological process (BP) predictions. Using only sequence distances, F-max of 0.38, 0.63 and 0.35 were reached respectively for MF, CC and BP. Considering sequence distances with transcriptomics and GO data improved the F-max to 0.51 for MF. Adding Reactome information to the previous combination allowed it to reach an F-max of 0.65 in CC. Finally, using sequence, GO and Reactome data the F-max was 0.48 in BP. Our results suggest deep learning integrating multi-source data is a promising tool for protein function prediction.