INVESTIGADORES
SPETALE Flavio Ezequiel
congresos y reuniones científicas
Título:
Bayesian Probabilistic Inference and the Development of a Highly Multiplex long read sequencing protocol for SARS-CoV-2 genomes
Autor/es:
GARCIA LABARI IGNACIO; JOAQUÍN EZPELETA; CASAL PABLO; VICTORIA POSNER; VILLANOVA, GABRIELA V.; SOFIA LAVISTA-LLANOS; BULACIO PILAR; SPETALE FLAVIO EZEQUIEL; MURILLO JAVIER; ANGELONE LAURA; PALLETA A; REMES LENICOV F; AGUSTINA CERRI; BOLATTI ELISA MARIA; SPINELLI SILVANA; GIRI ADRIANA; SILVIA ARRANZ; TAPIA ELIZABETH
Lugar:
Salamanca
Reunión:
Workshop; Workshop of RiaBio Ibero-American Network on Artificial Intelligence applied to BioData; 2021
Resumen:
Background: Viral genome sequencing allows identifying the evolutionary relationships among viruses, monitoring the validity of diagnostic tests, and investigating potential transmission chains. The objective of this work was the development of a complete protocol, from bench to bedside, for whole-genome, highly multiplexed SARS-CoV-2 sequencing. Towards this goal, we relied on a previously reported (in-silico) family of barcodes (NS-watermark) designed to deal with the high error-rates of long-read sequencing platforms. We used this protocol to identify the circulating variants and evolution of SARS-CoV-2 in Santa Fe, Argentina.Results: We built upon the amplicon tiling strategy described previously by Quick J. et al 2017 for the rapid whole-genome virus sequencing of clinical samples and coupled it with the NS-watermark multiplex sequencing strategy described previously by Ezpeleta J. et al. 2017. We focused on the SARS-CoV-2 multiplex-PCR 1.5 Kb amplification protocol (2x12-plex reactions), originally designed for the expensive and not portable PacBio sequencing machines, and adapted it for a low-cost and portable MinION alternative. We developed a multiplex sequencing protocol with barcoding sets of increasing size, 12, 48, and 96, out of a major set of 4096, and modified the multiplex-PCR protocol to allow double-end symmetrical-barcoding of amplicon samples with these rather long barcodes (36 nt). Pools of 12, 48, and 96 samples (including technical replicates) were sequenced together on the MinION sequencer. After base-calling and trimming of sequencing adapters, reads were individually deconvoluted using an approximate bayesian inference approach for the identification of individual barcodes (implemented by the NS-watermark decoding software). The use of a Bayesian inference approach allows a fine control of the critical trade-off between the rate of read recovery and the crosstalk-rate. Even for 96 samples, high coverage rates (> 98% of thegenome) and depths (> 30X in each amplicon fragment of 1.5 Kb) were obtained. A subset of 110 complete genomes collected (March-December, 2020, available at GISAID) from individuals residing in 43 localities in the south of the province of Santa Fe were classified by dynamic lineage taxonomy with the Pangolin COVID-19 Lineage Assigner and were analyzed phylogenetically by IQ-TREE. The genomes obtained corresponded to 6 lineages, in coincidence with what was observed in other Argentine provinces during 2020. Clusters of sequences with close geographical proximity were identified, evidencing chains of viral transmission within the province of Santa Fe and / or with neighboring provinces.Conclusions: Our results validate the multiplex sequencing methodology developed with the NS-watermark barcodes that makes it possible to democratize genomic sequencing for the active surveillance of SARS-CoV-2 and may be extended to other emerging viruses in the future.