INVESTIGADORES
SOTO Axel Juan
congresos y reuniones científicas
Título:
Interactive Visual Analysis Methodology for Improving Descriptor Selection in QSPR: First Steps
Autor/es:
MARÍA J. MARTÍNEZ; FIORELLA CRAVERO; GUSTAVO ESTEBAN VAZQUEZ; MÓNICA FÁTIMA DIAZ; AXEL JUAN SOTO; IGNACIO PONZONI
Lugar:
San Carlos de Bariloche
Reunión:
Conferencia; V Argentinean Conference on Computational Biology and Bioinformatics; 2014
Institución organizadora:
Asociación Argentina de Bioinformática y Biología Computacional
Resumen:
Background.The design of QSAR/QSPR models requires dealing with several problems. One of them is the selection of the most relevant set of molecular descriptors for the property or activity that is intended to be modeled. One central point in this task is how we can involve the domain expert (e.g. a chemist), so that he can incorporate his knowledge and expertise during the feature selection process [1]. In this context, strategies based on dynamic visual analysis can be useful. The main idea behind visual analytics approaches is to merge the computational capacity of statistical and machine learning methods with the human natural ability of identifying patterns in visualizations. Therefore, by allowing some form of interaction in the visualizations, users can explore the data and provide feedback to the method, and/or use the tool to arrive at more informative decisions. In this work we report our first experiences in the design of a methodology, which combines statistical methods with interactive visualizations, in order to address the problem of molecular descriptor selection.Methodology.The interactive visual analytics tool proposed is used for exploring alternative QSAR models, and it is organized in four charts (Figure 1): two undirected graphs that represent pairwise associations between descriptors, a bipartite graph, which represents the relationship among models and descriptors, and a customized plot area, which depicts different relationships between the descriptors and the target property. Some relevant characteristics that can be highlighted by the visualizations are: redundant descriptors, descriptors that provide discriminative information, relevant descriptors by consensus among alternative models, and descriptors whose knowledge helps decrease the uncertainty about the value of the target property. In this way, the modeler can analyze the different aspects involved in the QSAR/QSPR model design simultaneously.Results and conclusions. The capabilities of our tool were assessed through two case studies. One study corresponds to the prediction for VOCs (volatile organic compounds) [2]. The tool was used to select one subset of descriptors from a group of four alternatives subsets. The other study, corresponds to the prediction of elongation at break for high molecular weight polymers [3]. In this scenario, the tool was used to illustrate the case where the analyst wants to modify the automatic selections of descriptors in order to incorporate an experimental parameter to the model. In both cases, the results showed the suitability and convenience of this methodology for selecting sets of descriptors with desirable characteristics (low cardinality, high interpretability, low redundancy and high statistical performance) in an exploratory and versatile way.