IBIOBA - MPSP   22718
INSTITUTO DE INVESTIGACION EN BIOMEDICINA DE BUENOS AIRES - INSTITUTO PARTNER DE LA SOCIEDAD MAX PLANCK
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
A computational platform for human genome analysis and interpretation in Argentina
Autor/es:
PATRICIO YANKILEVICH; MAXIMILIANO DE SOUSA SERRO; DIEGO WALLACE; DANIEL KOILE
Lugar:
Cambridge
Reunión:
Conferencia; Genome Informatics; 2014
Institución organizadora:
Wellcome Trust
Resumen:
A personal genome interpretation platform is being developed to identify molecular and genetic variations within the Argentinean population. The analysis of genetic screening information will allow us to elucidate local disease pathways and identify new drug targets. The platform use in clinical trials will speed up time and reduce risks by recruiting participants based on their genetic profiles, which combined with the trial results will allow to inform therapeutic development and identify the genetic causes in drug response and side effects. Eventually, the platform may help us to better understand the genetic basis of local diseases, to make more accurate diagnosis, have a better understanding of prognosis and take better treatment decisions. The platform is composed of four main components: a computer cluster, an NGS data analysis pipeline, a set of biological knowledge databases and a platform website. The platform workflow moves from reads (data) to identified variants (information) to selected risk variants associated with disease (knowledge) to an online interactive final report. The genetic risk assessment and the probability of positive drug response are estimated by combining population data with the individual genotype using Bayes rule. The final report design includes integrative visualizations, visual quantitative assessments and ideograms to make the interpretation of results a more comprehensive experience to medical geneticists, researchers and clinical trail specialists. The NGS data analysis pipeline is being developed using over 15 public open source algorithms, developed by research groups from leading institutions, which conform today?s best practices in NGS data analysis. This guarantees a transparent data analysis and reproducibility. The pipeline is designed as seven independent modules, which sequentially execute the different genome analysis tasks. The modules are listed below, showing some of the software packages, algorithms and computational methods being used in brackets: 1. Secondary Analysis Module - QC, Alignment, Assembly (FASTX-Toolkit, BWA, SAMtools). 2. Tertiary Analysis Module - Variant Identification (SAMtools, GATK, BEDTools). 3. Variant Annotation Module - Annotation of identified variants (ANNOVAR, AnnTools, SVA). 4. Interpretation Module - Filtering and prioritization of identified variants (VEP, GATK, SIFT). 5. Final Report Module - Results, Statistics and Ideograms interactive report (D3, Processing). 6. Visualization Module - Genome and Variant visualization (cross-module ? Circos, IGV). 7. RNA-Seq Analysis Module ? Gene expression (additional module - TopHat, Cufflinks). Some of the biological knowledge bases being integrated into the pipeline are: dbSNP, dbVAR, COSMIC, HGMD, HAPMAP, 1000 Genomes, NHGRI GWAS, PharmGKB, DrugBank, OMIM.

