IPATEC   26054
INSTITUTO ANDINO PATAGONICO DE TECNOLOGIAS BIOLOGICAS Y GEOAMBIENTALES
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
Cleaning 'Nwanted regions: A novel approach to getting rid of N content by assemblies combination
Autor/es:
NICOLAS BELLORA; MARTIN MOLINE; DIEGO LIBKIND; PAULA NIZOVOY
Lugar:
Buenos Aires
Reunión:
Simposio; Argentine Symposium of Young Bioinformatics Researchers (2SAJIB); 2017
Institución organizadora:
ISCB RSG-Argentina
Resumen:
P { margin-bottom: 0.08in; }Advancesin sequencing technology allow genomes to be sequenced at decreasedcost enabling the creation of the bedrock of genome research.However, poor quality assemblies impair genomic predictions andinferences based upon them,hampering search of common genes, syntenic regions, etc. Moreover,the large availability of genomes deposited in GenBank does not implycorrectness of genome sequences. For these reasons, quality controlover assembly process and comparison between different algorithms andavailable tools can not be neglected in order to guarantee a solidinitial step. Combination of outputs is a valid strategy to get thebest of each approach.Inorder to do so, we present a Python based script developed to reduceassemblies' (N)n tracts by merging overlapping scaffolds obtainedfrom different assemblers.Draftsgenome sequences and sequencing reads of two psychrotolerantbasidiomycetous yeasts used as models (Naganishiavishniacii ANT03-052and Dioszegiacryoxerica ANT03-071)were downloaded from JGI Genome Portal. Denovo assemblyof both yeast genomes was performed with SPAdes under an approachtested in our laboratory for a variety of yeasts genomes. These newscaffolds were used to improve downloaded versions by joining andextending overlapping scaffolds and replacing previously undefinedregions.Toachieve the later both set of scaffolds (older and newly assembled)were used as input for our pipeline. Briefly, it consisted in BLATsearch of (N)n tracts' contiguous regions in the new scaffolds,extraction of nucleotide sequences defined between those flankingregions and re-incorporation of resulting fragment over N(n) tractsin older versions, generating assemblies with a lesser content ofundefined zones.Bymeans of this approach, 57% and 43% of previously undefined regionswere resolved in Naganishia and Dioszegia assemblies respectively.Extensionof scaffolds' lengths and reduction in quantity was achieved inNaganishia's draft genome, resulting in an assembly with 7 scaffoldsless than the downloaded version (37 vs 31 units).Yeastsisolated from extreme environments are of interest for their abilityto utilize a broad range of carbon sources and for the production ofconsiderable amounts of biotechnological relevant metabolites.Improvements achieved in draft genomes enable a more confidentgenomic study of these species aiming to find genes involved in thosepromising pathways.Pipelinehere presented may be also of interest for the improvement of largergenomes.