INVESTIGADORES
STEGMAYER Georgina Silvia
congresos y reuniones científicas
Título:
Evaluating transfer learning for classification of proteins in bioinformatics
Autor/es:
R. VITALE, G. STEGMAYER
Reunión:
Simposio; ASAI - Simposio Argentino de Inteligencia Artificial - 52º Jornadas Argentinas de Informática; 2023
Institución organizadora:
SADIO
Resumen:
This study presents a solution to significantly improve proteinclassification into families or domains using transfer learning. Withmore than 229 million proteins in UniProtKB, only 0.25% of them havebeen annotated and classified into over 17,000 possible families. Recently,deep learning (DL) models appeared for this task. However, DL modelsrequire large amounts of data for training, and most protein families havejust a few examples. To tackle this issue, we propose the application ofTransfer Learning (TL) to the classification problem. The TL approachinvolves self-supervised learning on large and unlabeled datasets to generatea numerical embedding for each data point. This representationlearned can then be used with supervised learning on a small, labeleddataset for a specific classification task. The results achieved in this studyindicate that using TL for protein families classification can reduce theprediction error by 55% compared to standard methods and by 32% comparedto DL models with simple input representations such as one-hotencoding. This study demonstrates that transfer learning is an effectiveand promising technique to improve protein classification and annotationin large and yet un-annotated databases.