INVESTIGADORES
AMADIO Ariel Fernando
congresos y reuniones científicas
Título:
Strategies for gap-closure of Thermus sp. 2.9 genome
Autor/es:
NAVAS L; AMADIO AF; ZANDOMENI RO
Lugar:
Oro Verde
Reunión:
Congreso; 3º Congreso Argentino de Bioinformática y Biología Computacional; 2012
Institución organizadora:
Asociación Argentina de Bioinformática y Biología Computacional
Resumen:
Extremophile organisms are of great interest due to their potencial as sources of proteins for biotechnological application. A thermophilic bacterium was isolated from a hot water spring in Salta, Argentina. Phylogenetic analysis indicated that it belongs to the Thermus genus. DNA sequencing was performed using Roche 454 technology to obtain the complete genome sequence. Two hundred and fifteen thousand non-paired readings were obtained totaling 81.238.046 pb and providing approximately 35-40 fold coverage of the genome size (estimated in 2Mpb). Reads were assembled de novo using Newbler (v2.3), which generated 137 contigs larger than 500 nucleotides and a N50 of 39.906pb. The G+C genome content resulted in 66.7%. Different bioinformatics strategies were used to predict the collinearity between contigs to finish the genome. First, synteny with two species of the Thermus genus were analyzed and compared to contigs from the isolate 2.9. A second strategy consisted in the generation of an optical map from Thermus sp 2.9 genome (OpGen, Sanger Institute) using the restriction enzyme NheI. It allowed comparing the restriction patterns of the whole genome with those of each contig generated in silico. Finally, a fosmid library (Epicentre) was generated with an insert size of 30-40Kb, and the ends of 150 clones were sequenced. All this approaches allowed the generation of scaffolds to order the contigs. As the result of these strategies 95 joins were predicted. Thirty two of them were confirmed by PCR and sequencing of amplified products. The average size of the sequenced gaps was ~1200 bp. Currently, we have 10 scaffolds which cover 98% of the genome. Following this strategy we were able to join several contigs, and order many of them. However, it is clear that obtaining one scaffold (or ideally one contig) is particularly complex for genomes with high GC content. To increase the information and get a finished genome, we are currently planning a mate-paired run with an insert size of ~8kb, aiming not only to join the 10 scaffolds, but also solve repetitive sequences of remaining contigs.