IMBICE   05372
INSTITUTO MULTIDISCIPLINARIO DE BIOLOGIA CELULAR
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
Low-cost, rapid variant discovery in diverse populations using next-generation sequencing
Autor/es:
ADAMS A; COOKE T; MUZZIO M; KENNY EE; BUSTAMANTE CD
Lugar:
Cold Spring Harbor
Reunión:
Congreso; Biology of Genomes; 2013
Resumen:
As we apply next-generation sequencing technologies to large numbers of samples from diverse human populations, it is becoming clear that the majority of variants throughout the genome are both rare and private to a given population. These rare alleles provide more information about recent demographic events, but they remain costly to discover due to the large sample sizes required. As a solution, we have applied genotyping-by-sequencing (GBS) to humans. This is an inexpensive method for de novo variant discovery at randomly distributed neutral loci throughout the genome. In this approach, a common set of DNA fragments comprising ~1.5% of the genome is generated by digestion of genomic DNA with three restriction enzymes. Library prep is then performed, followed by pooling of up to 65 samples and size selection. The multiplexed pool is then sequenced on a single lane of HiSeq, resulting in the de novo identification of up to 45,000 SNVs per sample at low cost. Our GBS approach has several advantages over previous methods. Identical restriction enzyme recognition sequences at the end of each digested fragment have previously resulted in poor cluster generation on the Illumina flowcell. To solve this problem, we used a set of enzymes that cut 10-14 bp away from their recognition sites. Another issue with many GBS methods is consistency in coverage across samples. To maximize this consistency, we apply a Caliper LabChip size-selection step to ensure all sequenced fragments are in the same size range and to increase the number of variant sites in common among samples. In addition, to ensure that each sample contributes equally to the final pooled library, we perform a barcode-specific qPCR quantitation step. This also ensures that only fragments with adaptors on both ends that can form a cluster on the flowcell are detected by the quantitation. Finally, we are developing an error model and filtering pipeline for pruning error-prone sites from the data, such as sites that are adjacent to heterozygous restriction sites, in which only the allele containing the intact recognition sequence is cleaved during digestion. With its low cost and relatively simple library prep, we propose GBS as an affordable solution for simultaneous genotyping and de novo variant discovery without ascertainment bias in a large number of samples from diverse populations.