INVESTIGADORES
FERNANDEZ Elmer Andres
congresos y reuniones científicas
Título:
Ontology in genomic and proteomic experiments: Which reference list?
Autor/es:
FRESNO, CRISTOBAL; LLERA, ANDREA SABINA; GIROTTI, MARÍA ROMINA; PODHAJCER, OSVALDO LUIS; BALZARINI, MÓNICA; PRADA, FEDERICO; FERNÁNDEZ, ELMER ANDRÉS
Lugar:
Buzios, Brasil
Reunión:
Simposio; Brazilian Symposium on Bioinformatics (BSB'10) - International Workshop on Genomic Databases (IWGD'10); 2010
Institución organizadora:
Sociedad Brasilera de Bioinformatica
Resumen:
p { margin-bottom: 0.08in; } Background When dealing with ontology tools in genomic/proteomic experiments, different sets of requirements emerge from them[1]. One of these is to choose an appropriate gene reference list (GRL) to compare against the experimental differentially expressed gene list (DEGL). A common strategy is to use as a GRL the whole genome of the corresponding organism (i.e. a "default" option in many DAVID analysis) or, in the case of microarray experiments, the complete gene set available on the particular microarray chip used in the experiment. This seems to be reasonable, but the following questions arise: Shall we use the complete chip set? the genome? or only the reliably detected genes? And, particularly for proteomics experiments, why use the whole genome when only a partial subset of proteins may be seen in a particular experimental setting (i.e. secretome studies)? Changing the GRL will change the reference (NULL) proportion of the contrast distribution[2], however the appropriateness of the GRL to determine reliable enriched ontologies has not been acknowledged yet. Here we present an analysis of data from a microarray experiment designed to study wound healing in D. melanogaster, where we evaluated the effect of choosing different GRL for ontology analysis. Materials and methods Microarray experiments were carried out using Affymetrix Drosophila 2.0 chips, where only a Hemipterous gene (involved in embryogenesis dorsal closure) was mutated. Bioconductor tools were used to build our DEGL and for the identification of unreliably detected genes on the chip. Gene enrichment analysis was done using DAVID [3, 4] under three different GRLs mapped onto terms belonging to a biological process on Gene Ontology [5]: DAVID's Drosophila genome (DG, 8566 genes), DAVID's Drosophila 2.0 chip (CH, 7286 genes) and our “home made (HM)” GRL with only the reliably detected genes on the chip (4542). An EASE score less than 0.1 was used to identify enriched terms. Results A total of 51 significant terms were identified with the three GRLs: 41 terms for DG, 28 for CH and 33 in HM. A total of 19 terms (37%) were identified in all three comparisons, 27 (53%) terms were shared between DG and CH, 23 (45%) between DG andHM and 20 between CH and HM. Conclusion We show that the identification of enriched terms strongly depends on the selected GRL. Surprisingly CH-derived results were quite different to the DG results, even when the number of total genes in each reference is similar. Our results suggest that the selection of an appropriate GRL remains an issue and it should be carefully taken into account, especially in proteomic experiments where the experimental settings severelyconstrain the potentially seenproteins as well as the chance to know all of them in advance. Further analysis is needed to confirm our findings and devise an appropriate strategy for gene ontology analysis.