CENTRO DE INVESTIGACION Y DESARROLLO EN CIENCIAS APLICADAS "DR. JORGE J. RONCO"
Unidad Ejecutora - UE
congresos y reuniones científicas
Development of a highly specific ensemble of 2D computational models for the early recognition of Breast Cancer Resistance Protein (BCRP) substrates
MELISA GANTNER; MAURICIO E. DI IANNI; MARÍA E. RUIZ; ALAN TALEVI; LUIS BRUNO BLANCH
Congreso; 4to. Congreso Argentino de Bioinformática y Biología Computacional (4CAB2C) y 4ta. Conferencia Internacional de la Sociedad Iberoamericana de Bioinformática (SolBio); 2013
Sociedad Iberoamericana de Bioinformática
Background Breast Cancer Resistance Protein (BCRP) is a member of the ATP-Binding Cassette (ABC) efflux transporter superfamily, which, acting as drug and metabolite carriers, provide a biochemical barrier against drug penetration and contribute to detoxification . They are characterized by a broad substrate specificity and multiple binding sites , consequently their overexpression is linked to multidrug resistance issues in a diversity of diseases . The objective of this work is the development of an ensemble of computational models based on conformation-independent molecular descriptors capable of differentiating BCRP substrates and non-substrates. Materials and methods From a wide structural diversity data set composed by 156 substrates and 106 non-substrates of human wild type BCRP compiled from literature, representative training and test sets were obtained through a two-step clustering process. Classifier models have been developed through application of Linear Discriminant Analysis to random subsamples of Dragon molecular descriptors. Simple data fusion and statistical comparison of partial areas under the curve of Receiving Operating Characteristic (ROC) curves were applied to obtain the best 2-model combination. The models were validated through standard methodologies in order to assess its robustness and predictive ability. Finally a simulated virtual screening campaign was performed on a 577-compound database containing less than 5% of BCRP non-substrates (our hit) dispersed in 479 putative substrates (decoys) in order to estimate in a more realistic way the utility of our model in a real virtual screening approach. Results and conclusions The best 2-model combination presented 82% of overall accuracy in the training set and 74.5% of overall accuracy in the test set. Statistical comparison of ROC curves (see Figure 1) indicates the best 2-model ensemble outperforms the best individual models generated in both the training and test sets and in the simulated 577-compound database. Moreover, on the basis of ROC curves analysis the score threshold can be optimized to prioritize the accuracy in the prediction of either substrates or non-substrates, based on background-dependent criteria. The ensemble based on conformation-independent Dragon descriptors is particularly suitable to be applied in virtual screening campaigns of large chemical libraries without previous conformational analysis of database structures and is a potentially valuable tool to assist computer-aided drug design campaigns in order to solve BCRP-mediated multidrug resistance issues.