INVESTIGADORES
MARTINEZ Maria Jimena
congresos y reuniones científicas
Título:
DELPHOS: A Prototype Software Tool for Selection of Relevant Descriptors in QSAR Models
Autor/es:
SOTO, AXEL JUAN; MARTÍNEZ, MARÍA JIMENA; CECCHINI, ROCÍO LUJÁN; VAZQUEZ, GUSTAVO ESTEBAN; PONZONI, IGNACIO
Lugar:
Córdoba
Reunión:
Congreso; Segundo Congreso Argentino de Bioinformática y Biología Computacional; 2010
Institución organizadora:
Universidad Católica de Córdoba
Resumen:
The design of QSAR (Quantitative Structure-Activity Relationship) methods constitutes a promising research topic in drug discovery. Nevertheless, although during last decade the number of papers in this subject is high, prediction capacity of QSAR models still remains to be improved. In order to overcome these limitations, several subproblems must be addressed. First, relevant descriptors that link molecular and chemical information with the activity or property under study must be selected. In general, the descriptor selection task could not be manually achieved by experts, given the inherent complexity and non-linearity of the structure-activity relationships. Moreover, the number of molecular descriptors that may be calculated for a single compound is huge. Thereby, it is mandatory to have a computational method for the selection of the subset of molecular descriptors to be used in a QSAR model. Second, the machine learning method used for predicting the QSAR models must be robust in presence of noise in the data. Third and last, when we are using non-homogeneous data it is important to determine the real applicability domain of the inferred QSAR model, otherwise prediction accuracy can not be warranted for any compound. All the design premises mentioned above, which have been partially addressed by our research team in recent publications and constitute the kernel of the DELPHOS project. Our long-term goal is to develop an intelligent software system for supporting the development of QSAR models. In this paper, we present our first prototype tool for the selection of relevant molecular descriptors for QSAR models. It is based on the two-phase wrapper method. The main features of the software can be summarized as follows:GUI. A Graphic User Interface is used for allowing a user to use the software without the need to know specific details of the code or the applied methods. Data handling: input data can be fed to the method using the CSV file format or standard Matlab matrix files. Computation performed after any phase can be saved and later restored. First Phase: Feature Searching and Evaluation. The first phase is responsible of doing a coarse searching and a fast evaluation among all feasible subsets of descriptors. Several different parameters could be set for this phase. Second Phase: Learning Method. Using the data computed in the first phase, a thorough evaluation is applied in order to determine which subsets of the coarse selection are the most relevant ones. Post-processing. After the second phase has been executed, tables showing final results and several statistical metrics are presented.