INVESTIGADORES
PODHAJCER Osvaldo Luis
congresos y reuniones científicas
Título:
GOboot: towards a robust SEA analysis
Autor/es:
CRISTÓBAL FRESNO; LLERA A; GIROTTI MR; VALACCO MP; LÓPEZ, JA; LAURA ZINGARETTI; LAURA PRATO; OSVALDO PODHAJCER; MÓNICA G BALZARINI; FEDERICO PRADA; ELMER A FERNÁNDEZ
Reunión:
Congreso; 3° Congreso Argentino de Bioinformática y Biología Computacional (A2B2C); 2012
Resumen:
Set enrichment analysis (SEA) is the traditionally used approach for Gene Ontology (GO) analysis, due to its trajectory andavailability over commercial and public tools/websites [1-2]. In the GO structure, each term is statistically evaluated at a timeresulting enriched if the observed proportion of differentially expressed proteins/genes differ from the expected whencompared against a background reference (BR). The appropriate BR is difficult to devise and GO results tend to depend on it. Inthis sense, terms would result enriched or not according to the BR used. Here, a new method is presented to evaluate theenrichment robustness of nodes by means of bootstrap perturbations of the used BR. Thus, each node will have a ?powerscore?, where high stability nodes are candidates to by explored and leaving spurious enriched terms out of the analysis.MethodsA resampling technique was implemented to provide a stability (power) measure of SEA to evaluate the effectiveness of a givenBR to identify true enriched terms. Simulated BRs were generated by bootstrapping a BR, trying to keep each simulated BR asclose as possible to the length of the original BR (in order to introduce small perturbations in length of both GO members andBR). The power value was calculated as the percentage of times a term gets enriched, over a high number of simulated BRs. Inthis sense, higher power implies greater stability of the term.DAVID [3] was the chosen tool to test SEA in a proteomic (Girotti et al., unpublished) and three microarray experiments freelyavailable at Gene Expression Omnibus [4-6] under different BRs: the genome of the specie (BR-I), the chip-gene list (BR-II, ifpossible) and a user defined reference (BR-III [7]). The BR-III (but is not restricted to) was the reference used for powercalculation, as it is considered the one which fulfills the statistical assumption. Boxplot of the enriched terms of main GOcategory (Biological Process) was plotted, using a Venn-diagram color pattern to contrast enrichment with typical BR selections(BR-I or BR-II).ResultsIn Figure 1 it is possible to see that the powerboxplots of all enriched nodes (in white) are above40% for most of datasets. Almost all nodes found inBR-III reached power values above 50%.Meanwhile, those nodes that appeared enriched bybootstrapping BR-III and previously found by BR-I orshared by BR-I & II, showed power values less than40% in all cases. This suggests that enriched nodesfound by BR-III were highly consistent andpotentially meaningful. These enriched terms werevalidated by literature.DiscussionBy means of stability analysis it was shown that non-consensus nodes identified only with BR-I and/or BR-II are unstable,suggesting spurious enrichment. On the contrary, enriched terms found by BR-III showed high power suggesting more?confidence? (robustness) making these terms good candidates for further exploration. We found that ?robust? terms wherebiologically relevant to the experimental setting [7]. In this context, the proposed tool provided additional information (powervalues) addressing ontology exploration and new unseen terms blurred by the traditional approaches, to assist researchers inontology analysis.References*1+ P. Khatri, S. Drăghici, Ontological analysis of gene expression data: current tools, limitations, and open problems, Bioinformatics, 21, 3587-3595 (2005)[2] D. Wei Huang et al. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists, Nucleic Acids Res., 37:1-13(2009)[3] I. Rivals, L. Personnaz, L. Taing, M-C. Potier, Enrichment or depletion of a GO category within a class of genes: which test? Bioinformatics, 23, 401-407 (2007)[4] L. M. Packer et al. Gene expression profiling in melanoma identifies novel downstream effectors of p14ARF, Int. J. Cancer, 121, 784-790 (2007)[5] A. Spira et al. Effects of cigarette smoke on the human airway epithelial cell transcriptome. Proc. Natl. Acad. Sci. U. S. A., 101, 10143-10148 (2004)[6] S. McGrath-Morrow et al. Impaired lung homeostasis in neonatal mice exposed to cigarette smoke. Am. J. Respir. Cell. Mol. Biol., 38, 393-400 (2008)[7] C. Fresno, A. S. Llera, M. R. Girotti, M. P. Valacco, J. A. López, O. L. Podhajcer, M. G. Balzarini, F. Prada, E. A. Fernández, The Multi-Reference ContrastMethod: facilitating set enrichment analysis, Comput. Biol. Med. 42, 188-194 (2012)BR-I BR-IIBR-III