INVESTIGADORES
NAVAS Laura Emilce
congresos y reuniones científicas
Título:
Strategies for gap-closure of Thermus sp. 2.9 genome
Autor/es:
LAURA E. NAVAS; ARIEL F. AMADÍO; RUBÉN O. ZANDOMENI
Reunión:
Congreso; 3er Congreso Argentino de Bioinformática y Biología Computacional; 2012
Resumen:
Extremophile organisms are of great interest due to their potencial as
sources of proteins for biotechnological application. A thermophilic
bacterium was isolated from a hot water spring in Salta, Argentina.
Phylogenetic analysis indicated that it belongs to the Thermus genus.
DNA sequencing was performed using Roche 454 technology to obtain the
complete genome sequence. Two hundred and fifteen thousand non-paired
readings were obtained totaling 81.238.046 pb and providing
approximately 35-40 fold coverage of the genome size (estimated in
2Mpb). Reads were assembled de novo using Newbler (v2.3), which
generated 137 contigs larger than 500 nucleotides and a N50 of 39.906
pb. The G+C genome content resulted in 66.7%.Different
bioinformatics strategies were used to predict the collinearity between
contigs to finish the genome. First, synteny with two species of the
Thermus genus were analyzed and compared to contigs from the isolate
2.9. A second strategy consisted in the generation of an optical map
from Thermus sp 2.9 genome (OpGen, Sanger Institute) using the
restriction enzyme NheI. It allowed comparing the restriction patterns
of the whole genome with those of each contig generated in silico.
Finally, a fosmid library (Epicentre) was generated with an insert size
of 30-40Kb, and the ends of 150 clones were sequenced. All this
approaches allowed the generation of scaffolds to order the contigs.As
the result of these strategies 95 joins were predicted. Thirty two of
them were confirmed by PCR and sequencing of amplified products. The
average size of the sequenced gaps was ~1200 bp. Currently, we have 10
scaffolds which cover 98% of the genome. Following this strategy we were
able to join several contigs, and order many of them. However, it is
clear that obtaining one scaffold (or ideally one contig) is
particularly complex for genomes with high GC content. To increase
the information and get a finished genome, we are currently planning a
mate-paired run with an insert size of ~8kb, aiming not only to join the
10 scaffolds, but also solve repetitive sequences of remaining contigs.