INVESTIGADORES
MARTINEZ Maria Jimena
congresos y reuniones científicas
Título:
FS4RVDD: A Feature Selection Algorithm for Random Variables with Discrete Distribution
Autor/es:
CRAVERO, FIORELLA; SCHUSTIK, SANTIAGO; MARTÍNEZ, MARÍA JIMENA; DIAZ, MÓNICA FÁTIMA; PONZONI, IGNACIO
Lugar:
Cádiz
Reunión:
Conferencia; IPMU 2018: 17th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems; 2018
Resumen:
Feature Selection is a crucial step for inferring regression and classificationmodels in QSPR (Quantitative Structure?Property Relationship)applied to Cheminformatics. A particularly complex case of QSPR modellingoccurs in Polymer Informatics because the features under analysis require themanagement of uncertainty. In this paper, a novel feature selection method foraddressing this special QSPR scenario is presented. The proposed methodologyassumes that each feature is characterized by a probabilistic distribution ofvalues associated with the polydispersity of the polymers included in thetraining dataset. This new algorithm has two sequential steps: ranking of thefeatures, generated by correlation analysis, and iterative subset reduction,obtained by feature redundancy analysis. A prototype of the algorithm has beenimplemented in order to conduct a proof of concept. The method performancehas been evaluated by using synthetic datasets of different sizes and varying thecardinality of the feature selected sub-sets. These preliminary results allowconcluding that the chosen mathematical representation and the proposedmethod is suitable for managing the uncertainty inherent to the polymerization.Nevertheless, this research constitutes a piece of work in progress and additionalexperiments should be conducted in the future in order to assess the actualbenefits and limitations of this methodology.