INVESTIGADORES
DIAZ Monica Fatima
congresos y reuniones científicas
Título:
Feature Selection in Molecular Informatics: Improving QSAR/QSPR Modeling by Computational Intelligence Approaches and Interactive Visual Analysis
Autor/es:
MARÍA JIMENA MARTÍNEZ; FIORELLA CRAVERO; DAMIAN PALOMBA; AXEL J. SOTO; MONICA F. DIAZ; GUSTAVO E. VAZQUEZ; IGNACIO PONZONI
Lugar:
CABA, Buenos Aires
Reunión:
Simposio; Argentine Symposium on Artificial Intelligence 2014- 43JAIIO; 2014
Institución organizadora:
SADIO (Sociedad Argetnina de Informatica) y Universidad de Palermo
Resumen:
Quantitative structure activity relationship (QSAR) models are regression or classification models widely used in cheminformatics. In a similar way, quantitative structure?property relationships (QSPR) models are the used term when a chemical property is modeled as the response variable. These models relate a set of molecular descriptors to a target variable and play a central role in several industrial applications, such as drug discovery and design of new materials. The design of QSAR/QSPR models requires dealing with several problems. One of them is the selection of the most relevant set of molecular descriptors for the property or activity that is intended to be modeled. Chemical structures are usually encoded by a variety of descriptor families such as functional groups, topological, constitutional, thermodynamic, quantum mechanical, etc. Several of them may contribute similar information or may be irrelevant for the biological activity under study, and thus, affecting the discovery of the descriptor-activity relationship. For this reason, the selection of the most important descriptors is regarded as one of the most difficult and crucial tasks for QSAR/QSPR modeling. Many feature selection methods used for dealing with this problem are focused on statistical relationships among the descriptors and target properties, leaving aspects associated with the chemical knowledge out of the picture. Therefore, the interpretability and generality of the models obtained by these methods are drastically affected. For this reason, a strategy for the incorporation of expert knowledge in the selection process is required in order to improve the user confidence in the QSAR/QSPR models.Another key problem is to identify the applicability domain (AD) of QSAR/QSPR models. The ADis the physicochemical, structural or biological space, knowledge or information on which the training set of the model has been developed, and for which it is applicable to make predictions for new compounds. In this context, it emerges the need of addressing the problem of how to obtain predictive QSAR/QSPR models highly reliable through a comprehensive cheminformatic approach, which considers the several computational subproblems related to the accuracy of these models. In particular, this project attacks four main aspects: design of new property-related molecular descriptors, optimal selection of molecular descriptors, the identification of the application domain models, and the improvement the semantic interpretability of the models. For dealing with these issues, we are working in the design of computational methodologies based on machine learning, evolutionary computation and visual analytics involving, as a global long-term goal, the development of intelligent computer systems for assisting experts in the design of new QSAR/QSPR models.