INVESTIGADORES
PONZONI Ignacio
congresos y reuniones científicas
Título:
FS4RVDD: A feature selection algorithm for random variables with discrete distribution
Autor/es:
CRAVERO, FIORELLA; SCHUSTIK, SANTIAGO A.; MARTINEZ, MARÍA JIMENA; DIAZ, MÓNICA F.; PONZONI, IGNACIO
Lugar:
Cádiz
Reunión:
Conferencia; 17th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 2018).; 2018
Institución organizadora:
Universidad de Cádiz
Resumen:
Feature Selection is a crucial step for inferring regression and classification models in QSPR (Quantitative Structure?Property Relationship) applied to Cheminformatics. A particularly complex case of QSPR modelling occurs in Polymer Informatics because the features under analysis require the management of uncertainty. In this paper, a novel feature selection method for addressing this special QSPR scenario is presented. The proposed methodology assumes that each feature is characterized by a probabilistic distribution of values associated with the polydispersity of the polymers included in the training dataset. This new algorithm has two sequential steps: ranking of the features, generated by correlation analysis, and iterative subset reduction, obtained by feature redundancy analysis. A prototype of the algorithm has been implemented in order to conduct a proof of concept. The method performance has been evaluated by using synthetic datasets of different sizes and varying the cardinality of the feature selected subsets. These preliminary results allow concluding that the chosen mathematical representation and the proposed method is suitable for managing the uncertainty inherent to the polymerization. Nevertheless, this research constitutes a piece of work in progress and additional experiments should be conducted in the future in order to assess the actual benefits and limitations of this methodology.