INVESTIGADORES
LLERA Andrea Sabina
congresos y reuniones científicas
Título:
The impact of RNASeq differential expression algorithms on OverRepresentation Analysis of Gene Sets.
Autor/es:
RODRIGUEZ, JUAN C.; MERINO, GABRIELA A.; LAURA PRATO; LLERA ANDREA; FERNANDEZ ELMER
Reunión:
Conferencia; 4th International Society for Comutational Biology-Latin America Bioinformatics Conference (ISCB-LA); 2016
Resumen:
Background: Transcriptome analysis is essential to elucidate phenotype biologicalchanges, where the detection of differentially expressed (DE) genes is the starting point; fora comprehensive understanding, functional analysis (FA) turns crucial. One of the most usedmethods of FA is the OverRepresentationAnalysis (ORA), which is fed with a list ofcandidate genes. The development of the RNASeqtechnology and large screeningsequencing projects as TCGA are providing new challenges for both DE genes detection(DEGD) and FA. It is known that the DEGD is affected by the used method thus affecting theFA. Despite this, DEGD methods were mainly compared in terms of statistical accuracy orgenes detected, but their impact on FA from a biological point of view has not beenaddressed so far. In this work we evaluate the impact of the most used DEGD methods forRNASeqdata on ORA results. For this, the well known TCGA breast cancer cohort wasused. Since there is no gold standard and simulated data lack biological information, theMicroArray (MA) data was used as reference, since this kind of data was widely used andanalyzed in terms of DEGD and ORA.Results: Breast Cancer RNASeqand MA expression data were downloaded from theTCGA repository. Subjects were classified as BasalLike,Her2, Luminal A and Luminal Bsubtypes by means of the PAM50 algorithm. Only those subjects who agreed classificationon both RNASeqand MA data were used, and those genes that were reliably detected inboth matrices were kept. Then, all pairwise combinations of subtypes (six) were comparedfor DEGD and subsequent ORA. In each case, to feed the ORA, DE genes were obtainedusing the three most DEGD commonly used methods of RNASeq,i.e., edgeR, DESeq2 andVoom+limma. The three edgeR gene dispersion estimation methods, i.e., common, trendedand tagwise, were also compared. The Gene Ontology gene sets were used for enrichmenttest by ORA.These ORA results were compared with those obtained using DE genes obtained from theMA data (evaluated by limma). In order to measure the similarities between the results ofRNASeqand MA, Jaccard distances, dendrograms and heatmaps were used and shown inFig. 1. Although all methods of RNASeqshow a percentage of gene sets enriched inaccordance with MA, it is observed that for the six evaluated contrasts the method whichshares the highest number of enriched gene sets with MA was Voom+limma. Moreover inthe dendrograms it is noted that for all cases there is a cluster of MA/Voom+limmacompletely separated from other methods, i.e., always the shortest distance was generated.Conclusions: For every evaluated contrast, the ORA results of Voom+limma method werethe most resembled to the MA results, thus we recommended this method to perform DEGDwith RNAseqdata to correctly perform ORA .