PERSONAL DE APOYO
GONZALEZ GermÁn Alexis
congresos y reuniones científicas
Título:
pSVMtune: An R library for parallel optimization of SVM parameters
Autor/es:
GERMÁN GONZÁLEZ; CRISTOBAL FRESNO; MÓNICA BALZARINI; ELMER FERNÁNDEZ; JOSÉ CROSSA
Lugar:
Córdoba
Reunión:
Congreso; Segundo Congreso Argentino de Bioinformática y Biología Computacional; 2011
Institución organizadora:
Asociación Argentina de Bioinformática y Biología Computacional
Resumen:
Current high-throughput technology allow us to measure and record simultaneously thousands of genes, proteins or molecules that can be use to interrogate different biological scenarios. Usually such genes, proteins or molecules are inspected to be used as molecular fingerprints with the expectation to build diagnostic methods or to explain biological behavior. In any current high-throughput experiments, one of the main limitations is the amount of available samples yielding, from a data mining point of view, to face the “curse of dimensionality” problem (when the amount of variables exceed the number of samples). In this context, Support Vector Machines could play an outstanding role since their theoretical properties suggest them appropriate to face “curse of dimensionality” problems. Spite to be robust and to provide optimal solutions, the building process requires setting several user defined parameters which are problem dependant. One of the usual ways to look for these parameters implies a grid search approach which spans several parameters combination in a cross validation strategy. Available tune functions for SVM are sequential, requiring long runs (several hours or days) when the number of variables is large. Here we present a parallel algorithm to look for such parameters. It is based on the “snowfall” R library. The algorithm significantly speeds up the process. Here we show the performance of the “pSVMtune” R library in a Genetic Selection problem where grain yield is predicted from thousands of molecular markers.