INVESTIGADORES
AMADIO Ariel Fernando
congresos y reuniones científicas
Título:
Complete genome sequencing of the thermophilic bacterium Thermus sp. 2.9 using an Illumina/pyrosequencing hybrid approach
Autor/es:
NAVAS L; ORTIZ EM; BENINTENDE GB; BERRETA MF; ZANDOMENI RO; AMADIO AF
Lugar:
Bariloche
Reunión:
Congreso; V Argentinian Conference on Bioinformatics and Computational Biology; 2014
Institución organizadora:
Asociación Argentina de Bioinformática y Biología Computacional
Resumen:
In this work we studied and compared different approaches undertaken for sequencing the genome of a thermophilic bacterium. We have isolated the thermophilic Thermus sp. 2.9 from a hot spring of Rosario de la Frontera, in Salta, Argentina. Thermophilic organisms contain relevant genes with potential biotechnological applications. There is also interest in studying the mechanism involved in bacterial adaptation to their extreme natural environment. We used Roche 454 and Illumina MiSeq platforms to generate unpaired and paired-end reads, respectively. The paired-end library was build using long jumping distance technology with a length of 8 Kb. The following table summarizes the results of sequencing and assemblies: Roche 454 Illumina MiSeq Roche454 + Illumina MiSeq # reads 215,557 2,139,062 2,354,619 Assembler Newbler MIRA MIRA # contigs 137 323 131 N50 39,906 17,661 79,216 Hybrid assembly using MIRA gave the best result. Scaffolding was performed with BAMBUS using the contigs coming from the hybrid assembly. Different values of redundancy were evaluated to consider true a link between contigs using paired reads. The best result was obtained with a minimum of 200 linked reads. In this way, seven scaffolds covering the entire bacterial chromosome were obtained. Using the information given by an optical map of the genome generated previously we were able to order and join the scaffolds, leading to the reduction of the whole chromosome to a single scaffold. Another three major scaffolds longer than 50 Kb were found homologous to plasmids reported for the genus, suggesting the presence of one or more plasmids in this strain. Genome annotation was made using the RAST server. We identified a total of 2,673 CDS, 48 tRNA and 3 rRNA gene-encoding regions. We analyzed these annotated features and found that 1,705 CDS can be associated to enzymes with defined functions. Corresponding EC number were assigned to those genes, while 968 CDS were classified as hypothetical proteins. Fifty-nine genes were selected as candidates for cloning and expression of the encoded proteins which have application in food industry and bioenergy, with high interest because of their potential thermostability.