IBYME   02675
INSTITUTO DE BIOLOGIA Y MEDICINA EXPERIMENTAL
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
A step forward to standard operating protocols for RNA-seq data analysis
Autor/es:
MERINO, GABRIELA; LA GRECA, ALEJANDRO; SORONELLAS, DANIEL; BEATO, MIGUEL; SARAGÜETA, PATRICIA; FERNANDEZ, ELMER; FRESNO, CRISTOBAL
Lugar:
Rosario
Reunión:
Congreso; 4to. Congreso Argentino de Bioinformática y Biología Computacional (4CAB2C) y 4ta. Conferencia Internacional de la Sociedad Iberoamericana de Bioinformática (SolBio); 2013
Institución organizadora:
Sociedad Iberoamericana de Bioinformática
Resumen:
A step forward to standard operating protocols for RNA-seq data analysis Merino GA1, 2, Fresno C1, 2, La Greca A3, Soronellas D4, Beato M4, Saragüeta P3, Fernandez EA1, 2. 1CONICET, Argentina 2Bioscience Data Mining Group, Catholic University of Córdoba, Córdoba, Argentina 3Institute of Biology and Experimental Medicine (IByME-CONICET), Buenos Aires, Argentina 4Centre for Genomic Regulation-UPF, Barcelona, España Keywords: RNA-seq, SOPs, quality control, bias Background RNA_seq technology is emerging as a promising tool for transcriptomics analysis. In general it implies the identification of gene and evaluates some experimental condition in terms of expression, using millions of reads produced by a high-throughput sequencing machine. The huge output is processed in several steps and requires taking in consideration many aspects. Although the technique is revolutionary, it is not stable yet. Therefore many variation sources including technical and random effects should be properly accounted in order to reduce bias from the results [1]. In addition, when you plan to do functional analysis it is compulsory to use the most appropriate gene identifier which depends on the functional platform. In this context, gene symbols are unsteady. Hence, a combination of data bases is necessary to retrieve the appropriate gene annotations. Here we propose a standard operating procedure for RNA_seq analysis, to detect and remove some bias sources from the data. The procedure is based on well-known microarray technology methodologies [2]. The analysis is also enriched by the integration of different data bases for gene symbol conversion to perform functional analysis. Materials and methods RNA-seq reads were obtained in a control-treatment experiment and aligned to a reference genome. Then, they were summarized using RPKM method [3]. Distributional features of each sample were inspected in order to detect technical bias between these samples. Visual inspection through MA plot allows identifying bias towards high expression values. Correction through quantile or loess method could be applied to correct for this bias. Once bias has been removed, differential expression analysis can be carried out according to the experimental design. With the intention of explore genes that are differentially expressed heat maps were built. In this experiment gene length bias were not observed affecting differentially expressed genes, thus functional analysis were performed. In this case the functional analysis platform DAVID [4] was used. Since DAVID works better with EntreZ Gene identifier, the HUGO Gene Nomenclature Committee [5] was used to map genes and verify or update theirs identifiers. Then, the org.Hs.eg.db database, available in Bioconductor [6], was queried to obtain the appropriate gene annotation. Finally the functional analysis was performed. Conclusions The workflow presented here could be taken into account for standard operative protocol sequencing and bioinformatic RNA_seq facility. Preliminary results of these steps allow a quick check of the underlying biological hypothesis testing experiment analyzed here. References [1] Oshlack A, Robinson MD, Young MD: From RNA-seq reads to differential expression results. Genome Biol 2010, 11.12: 220. [2] Gentleman R,Carey V, Huber W, Irizarry R, Dudoit S. Bioinformatics and computational biology solutions using R and Bioconductor. Ed. Robert Gentleman. Vol. 746718470. New York: Springer, 2005. [3] Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature methods 2008, 5.7:621-628 [4] DAVID. Functional annotation bioinformatics microarrays analysis [http://david.abcc.ncifcrf.gov/] [5] HUGO Gene Nomenclature Committee [http://www.genenames.org/] [6] Bioconductor. Open source software for bioinformatics [http://www.bioconductor.org/]