INVESTIGADORES
GIRI Adriana Angelica
congresos y reuniones científicas
Título:
Bayesian Probabilistic Inference and the Development of a Highly Multiplex long read sequencing protocol for SARS-CoV-2 genomes
Autor/es:
GARCÍA LABARI, IGNACIO; EZPELETA, JOAQUÍN; CASAL, PABLO E; POSNER, VICTORIA; VILLANOVA, G. VANINA; LAVISTA LLANOS, SOFÍA; BULACIO, PILAR; SPETALE, FLAVIO; MURILLO, JAVIER; ANGELONE, LAURA; PALETTA, ANA; REMES LENICOV, FEDERICO; CERRI, AGUSTINA; BOLATTI, ELISA M; SPINELLI, SILVANA; GIRI ADRIANA A; ARRANZ, SILVIA; TAPIA, ELIZABETH
Lugar:
Salamanca
Reunión:
Taller; I Taller de la Red Iberoamericana RiaBio sobre Inteligencia Artificial aplicada a BioData.; 2021
Institución organizadora:
Red Iberoamericana RiaBio sobre Inteligencia Artificial aplicada a BioData
Resumen:
Background: Viral genome sequencing allows identifying the evolutionary relationships among viruses, monitoring the validity of diagnostic tests, and investigating potential transmission chains. The objective of this work was the development of a complete protocol, from bench to bedside, for whole-genome, highly multiplexed SARS-CoV-2 sequencing. Towards this goal, we relied on a previously reported (in-silico) family of barcodes (NS-watermark) designed to deal with the high error-rates of long-read sequencing platforms. We used this protocol to identify the circulating variants and evolution of SARS-CoV-2 in Santa Fe, Argentina. Results: We built upon the amplicon tiling strategy described previously by Quick J. et al 2017 for the rapid whole-genome virus sequencing of clinical samples and coupled it with the NS-watermark multiplex sequencing strategy described previously by Ezpeleta J. et al. 2017. We focused on the SARS-CoV-2 multiplex-PCR 1.5 Kb amplification protocol (2x12-plex reactions), originally designed for the expensive and not portable PacBio sequencing machines, and adapted it for a low-cost and portable MinION alternative. We developed a multiplex sequencing protocol with barcoding sets of increasing size, 12, 48, and 96, out of a major set of 4096, and modified the multiplex-PCR protocol to allow double-end symmetrical-barcoding of amplicon samples with these rather long barcodes (36 nt). Pools of 12, 48, and 96 samples (including technical replicates) were sequenced together on the MinION sequencer. After base-calling and trimming of sequencing adapters, reads were individually deconvoluted using an approximate bayesian inference approach for the identification of individual barcodes (implemented by the NS-watermark decoding software). The use of a Bayesian inference approach allows a fine control of the critical trade-off between the rate of read recovery and the crosstalk-rate. Even for 96 samples, high coverage rates (> 98% of the genome) and depths (> 30X in each amplicon fragment of 1.5 Kb) were obtained. A subset of 110 complete genomes collected (March-December, 2020, available at GISAID) from individuals residing in 43 localities in the south of the province of Santa Fe were classified by dynamic lineage taxonomy with the Pangolin COVID-19 Lineage Assigner and were analyzed phylogenetically by IQ-TREE. The genomes obtained corresponded to 6 lineages, in coincidence with what was observed in other Argentine provinces during 2020. Clusters of sequences with close geographical proximity were identified, evidencing chains of viral transmission within the province of Santa Fe and / or with neighboring provinces.Conclusions: Our results validate the multiplex sequencing methodology developed with the NS-watermark barcodes that makes it possible to democratize genomic sequencing for the active surveillance of SARS-CoV-2 and may be extended to other emerging viruses in the future.