ICIC   25583
INSTITUTO DE CIENCIAS E INGENIERIA DE LA COMPUTACION
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
FS4RVDD: A Feature Selection Algorithm for Random Variables with Discrete Distribution
Autor/es:
MARTÍNEZ, MARÍA JIMENA; CRAVERO, FIORELLA; DÍAZ, MÓNICA F.; SCHUSTIK, SANTIAGO A.; PONZONI, IGNACIO
Lugar:
Cadiz
Reunión:
Conferencia; IPMU 2018: Information Processing and Management of Uncertainty in Knowledge-Based Systems.; 2018
Institución organizadora:
Universidad de Cadiz
Resumen:
Feature Selection is a crucial step for inferring regression and classification models in QSPR (Quantitative Structure/Property Relationship) applied to Cheminformatics. A particularly complex case of QSPR modelling occurs in Polymer Informatics because the features under analysis require the management of uncertainty. In this paper, a novel feature selection method for addressing this special QSPR scenario is presented. The proposed methodology assumes that each feature is characterized by a probabilistic distribution of values associated with the polydispersity of the polymers included in the training dataset. This new algorithm has two sequential steps: ranking of the features, generated by correlation analysis, and iterative subset reduction, obtained by feature redundancy analysis. A prototype of the algorithm has been implemented in order to conduct a proof of concept. The method performance has been evaluated by using synthetic datasets of different sizes and varying the cardinality of the feature selected sub-sets. These preliminary results allow concluding that the chosen mathematical representation and the proposed method is suitable for managing the uncertainty inherent to the polymerization. Nevertheless, this research constitutes a piece of work in progress and additional experiments should be conducted in the future in order to assess the actual benefits and limitations of this methodology.