CIDIE   24052
CENTRO DE INVESTIGACION Y DESARROLLO EN INMUNOLOGIA Y ENFERMEDADES INFECCIOSAS
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
parallel bootsratp consensus clustering
Autor/es:
FRESNO CRISTOBAL; ELMER A FERNÁNDEZ; GENTILLI FRIAS, G. ; SAENZ MACARENA
Lugar:
Buenos Aires
Reunión:
Congreso; Computational Biology Latin America Bioinformatics Conference; 2018
Institución organizadora:
Asociacion Argentina de Bioinformatica y biologia computacional
Resumen:
Consensus clustering is a well-known approach for class discovery which has been extensively used in gene expression pattern discovery. However, since it?s implemented as a serialprocedure, its application to current high throughput databases makes it impractical for Big Omics Data approaches in terms of required time. Another drawback of current implementationis the user defined selection of variables/subject proportions that could impact in the discovered classes. The aim of this work is to improve ConsensusClusterPlus R library implementationin order to reduce execution time, as well as the proposal of a bootstrap sampling approach that eliminates user defined parameters.The method was evaluated over two different gene expression datasets (17195 x 28 and 21770 x 105) running from 2 to 10 clusters, over 800 repetitions. They were evaluated using 1to 23 cores. The bootstrap sampling was also compared against a 70% sampling approach for both samples and variables, as well as against the original implementation.It is shown that just by doing a re-design of the algorithm there was an almost 20% speed improvement, increasing linearly up to 10 cores with a slope of 1.369e-10 per added core. Itthen tends to reach a plateau over 15 cores. Although the parallel implementation demanded an extra data postprocessing, it was not significant in comparison with the benefits obtainedby the main changes. The Bootstrap implementation showed similar performance to the original implementation when comparing results, whereas speed up curves showed for bothsampling alternatives a speed improvement of up to 9-fold, using up to 15 cores.