INVESTIGADORES
DI PERSIA Leandro Ezequiel
congresos y reuniones científicas
Título:
Comparison of neural network-based methods for similarity prediction in compounds with unknown structure
Autor/es:
EUGENIO BORZONE; LEANDRO EZEQUIEL DI PERSIA; MATIAS GERARD
Lugar:
Corrientes
Reunión:
Congreso; XII Congreso argentino de Bioinformatica y Biologia Computacional; 2022
Institución organizadora:
A2B2C
Resumen:
BackgroundSimilarity between compounds is widely used in chemoinformatics. Usually It is calculated using structural information of the compounds, so it is only available for compounds with known structure. To address this constraint, we use the information of the metabolic pathways topology, in order to infer similarity between compounds with unknown structure. In this work we compare on the same dataset three neural network-based models we have proposed, to solve the problem of compound similarity prediction.ResultsThe first model we propose is a Multilayer Perceptron (MLP) with one-hot compound encoded inputs. It was used to explore if it was possible to predict similarity with an MLP. Although it can provide a good prediction of similarity, the used encoding makes it impossible to apply the model for compounds with unknown structure. The second alternative is an MLP for prediction, using as inputs embeddings obtained from a model of random walks through the graph of compounds of a metabolic pathway, that preserve the proximity of the compounds in relation to the reactions in which they participate. This model provided errors lower than 10% in test, and it has capabilities to predict similarity to compounds with unknown structure. Finally, the third proposal uses a message passing Graph Neural Network to generate embeddings for the compounds, and an MLP for the similarity prediction. The errors in the test set were close to 2%, producing a high performance model. The model can also preserve the topological properties in the generated embeddings, i.e., similar compounds are mapped to close embeddings in the embeddings space.ConclusionsThree incremental models for similarity prediction of compounds with unknown structure are compared. From these promising results, we conclude that it is possible to predict similarity between compounds with unknown structures with good performance. In the future, features such as physicochemical information of the compounds will be incorporated to improve the generalization, and the models will be evaluated for larger metabolic pathways.