SINC(I)   25518
INSTITUTO DE INVESTIGACION EN SEÑALES, SISTEMAS E INTELIGENCIA COMPUTACIONAL
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
Predicting protein functions with deep learning and multi-source data
Autor/es:
RABIE SADIE, MARIA MARTIN; GABRIELA MERINO, DIEGO MILONE, GEORGINA STEGMAYER
Lugar:
fue virtual
Reunión:
Conferencia; Intelligent Systems for Molecular Biology (ISMB) 2020; 2020
Institución organizadora:
International Society for Computational Biology (ISCB)
Resumen:
Identifyingprotein functions is crucial in molecular biology. Experimental and manuallycuration are extremely time-consuming and expensive, and hence it cannot copewith the exponential increase of data. Thus, computational methods forautomatic function prediction are needed. Although such methods are beingconstantly developed, their performance is still subject for improvement.Wepropose novel deep learning models for predicting Gene Ontology (GO)annotations integrating multi-source data, represented as protein associationmatrices. Our models were trained and evaluated on yeast. Input associationmatrices were based on sequence distance, transcriptomics experiments, GO termsand Reactome annotations. The F-max, commonly-used in CAFA challenges, was usedfor evaluating molecular function (MF), cellular component (CC), and biologicalprocess (BP) predictions. Using only sequence distances, F-max of 0.38, 0.63and 0.35 were reached respectively for MF, CC and BP. Considering sequencedistances with transcriptomics and GO data improved the F-max to 0.51 for MF.Adding Reactome information to the previous combination allowed it to reach anF-max of 0.65 in CC. Finally, using sequence, GO and Reactome data the F-maxwas 0.48 in BP. Our results suggest deep learning integrating multi-source datais a promising tool for protein function prediction.