INVESTIGADORES
FERNANDEZ Elmer Andres
congresos y reuniones científicas
Título:
pSVMtune: An R library for parallel optimizatiopn of SVM parameters
Autor/es:
GONZALEZ, GERMÁN; FRESNO, CRISTOBAL; BALZARINI, MÓNICA; CROSSA, JOSE; FERNÁNDEZ, ELMER ANDRÉS
Lugar:
Cordoba
Reunión:
Congreso; 2do Congreso Argentino de Bioinformatica y Biologia Computacional; 2011
Institución organizadora:
A2B2C (www.a2b2c.org.ar)
Resumen:
Current high-throughput technology allow us to measure and record simultaneously thousands of genes,proteins or molecules that can be use to interrogate different biological scenarios. Usually such genes,proteins or molecules are inspected to be used as molecular fingerprints with the expectation to builddiagnostic methods or to explain biological behavior. In any current high throughput experiments, one ofthe main limitations is the amount of available samples yielding, from a data mining point of view, to facethe ?curse of dimensionality? problem (when the amount of variables exceed the number of samples). Inthis context, Support Vector Machines could play an outstanding role since their theoretical propertiessuggest them appropriate to FACE ?curse of dimensionality? problems. Spite to be robust and to provideoptimal solutions, the building process requires setting several user defined parameters which areproblem dependant. One of the usual ways to look for these parameters implies a grid search approachwhich spans several parameters combination in a cross validation strategy. Available tune functions forSVM are sequential, requiring long runs (several hours or days) when the number of variables is large.Here we present a parallel algorithm to look for such parameters. It is based on the ?snowfall?[1] Rlibrary[2]. The algorithm significantly speeds up the process. Here we show the performance of the?pSVMtune? R library in a Genetic Selection problem where grain yield is predicted from thousands ofmolecular markers[3].