INVESTIGADORES
BALZARINI Monica Graciela
congresos y reuniones científicas
Título:
pSVMtune: An R library for parallel optimization of SVM parameters
Autor/es:
GONZALEZ GERMAN; CRISTÓBAL FRESNO; MÓNICA BALZARINI; JOSÉ CROSSA; ELMER A FERNÁNDEZ
Lugar:
Córdoba
Reunión:
Congreso; 2do Congreso Argentino de Bioinformática y Biología Computacional; 2011
Institución organizadora:
UCC
Resumen:
Current high-throughput technology allow us to measure and record simultaneously thousands of genes, proteins or molecules that can be use to interrogate different biological scenarios. Usually such genes, proteins or molecules are inspected to be used as molecular fingerprints with the expectation to build diagnostic methods or to explain biological behavior. In any current high-throughput experiment one of the main limitation is the amount of available samples yielding, from a data mining point of view, to face the "curse of dimensionality" problem (when the amount of variables exceed the number of samples). In this context, Support Vector Machines could play an outstanding role since their theoretical properties suggest them appropriate to face "curse of dimensionality" problems. Spite to be robust and to provide optimal solutions, the building process requires setting several user defined parameters which are problem dependant. One of the usual ways to look for these parameters implies a grid search approach which spans several parameters combination in a cross validation strategy. Available tune functions for SVM are sequential requiring long runs (several hours or days) when the number of variables is large. Here we present a parallel algorithm to look for such parameters. It is based on the "snowfall1" R library2. The algorithm significantly speeds up the process. Here we show the performance in a Genetic Selection problem where grain yield is predicted from thousands of molecular markers3.