BECAS
PRADA GORI Denis Nihuel
congresos y reuniones científicas
Título:
iRaPCA: A NOVEL METHOD FOR CLUSTERING OF SMALL MOLECULES.
Autor/es:
ALBERCA LUCAS NICOLÁS; PRADA GORI DENIS NIHUEL; BELLERA CAROLINA; TALEVI ALAN
Lugar:
Capital Federal
Reunión:
Congreso; XI Argentinian Congress of Bioinformatics and Computational Biology; 2021
Institución organizadora:
Asociacion Argentina de Bioinformática y Biología Computacional
Resumen:
Background:Clustering of molecules implies the organization of a group of molecules into smaller subgroups (clusters)with similar features. Typical applications of this methodology involve the representative splitting ofdatasets for QSAR and the selection of representative in silico hits from in silico screening experiments foracquisition and submission to experimental confirmation. In this work, we present an in-house hierarchicalrepresentative sampling procedure for clustering of small molecules. The approach, which we callediRaPCA, is based on an iterative combination of the random subspace approach (feature bagging), PrincipalComponent Analysis (PCA) and the k-means algorithm. Our method has been converted to webapp so thatany user can upload their smiles and perform the clustering with their own parameters.Results:A new online tool for the clustering of molecules has been developed. We have tested our tool in 29 datasetscontaining between 100 and 5000 small molecules, while comparing these results with the clusters fromthree other well-known clustering methods (Ward, Complete and Butina methods), as a benchmarkingexercise. In all cases, internal validation has been performed. The mean silhouette score obtained for the 29datasets by our method was 0.9045, while from Ward, Complete and Butina methods the silhouette scorewas 0.43, 0.42 and 0.27, respectively. Regarding the number of clusters obtained and the percentage ofoutliers (atypical molecules), iRaPCA on average obtains, for the optimal clustering judging from thesilhouette coefficient, 14.2 clusters and less than 1% of outliers, while the other methods on averagegenerate, for the optimal clustering, 221.9 (Ward), 175.8 (Complete) and 127.6 (Butina) clusters and morethan 10% of outliers (23% in the case of Butina).Conclusions:iRaPCA has shown a great potential for the generation of dense and separated clusters of molecules as havebeen demonstrated in the benchmarking exercise. The implementation of our method as a Web App allowsusers who are unfamiliar with programming to perform a quick and easy clustering of molecules using theirown parameters.