IPATEC   26054
INSTITUTO ANDINO PATAGONICO DE TECNOLOGIAS BIOLOGICAS Y GEOAMBIENTALES
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
Nwanted regions: A novel approach to get rid of N content by genome assemblies combination
Autor/es:
NICOLAS BELLORA; PAULA NIZOVOY; MARTÍN MOLINÉ; DIEGO LIBKIND
Lugar:
CABA
Reunión:
Simposio; 2nd Argentine Symposium of Young Bioinformatics Researchers (2SAJIB); 2017
Institución organizadora:
Universidad Nacional de San Martín
Resumen:
Advances in sequencing technology allow genomes to be sequenced at decreased cost enabling the creation of the bedrock of genome research. However, poor quality assemblies impair genomic predictions and inferences based upon them, hampering search of common genes, syntenic regions, etc. Moreover, the large availability of genomes deposited in GenBank does not imply correctness of genome sequences. For these reasons, quality control over assembly process and comparison between different algorithms and available tools can not be neglected in order to guarantee a solid initial step. Combination of outputs is a valid strategy to get the best of each approach. In order to do so, we present a Python based script developed to reduce assemblies' (N)n tracts by merging overlapping scaffolds obtained from different assemblers. Drafts genome sequences and sequencing reads of two psychrotolerant basidiomycetous yeasts used as models (Naganishia vishniacii ANT03-052 and Dioszegia cryoxerica ANT03-071) were downloaded from JGI Genome Portal. De novo assembly of both yeast genomes was performed with SPAdes under an approach tested in our laboratory for a variety of yeasts genomes. These new scaffolds were used to improve downloaded versions by joining and extending overlapping scaffolds and replacing previously undefined regions. To achieve the later both set of scaffolds (older and newly assembled) were used as input for our pipeline. Briefly, it consisted in BLAT search of (N)n tracts' contiguous regions in the new scaffolds, extraction of nucleotide sequences defined between those flanking regions and re-incorporation of resulting fragment over N(n) tracts in older versions, generating assemblies with a lesser content of undefined zones. By means of this approach, 57% and 43% of previously undefined regions were resolved in Naganishia and Dioszegia assemblies respectively. Extension of scaffolds' lengths and reduction in quantity was achieved in Naganishia's draft genome, resulting in an assembly with 7 scaffolds less than the downloaded version (37 vs 31 units). Yeasts isolated from extreme environments are of interest for their ability to utilize a broad range of carbon sources and for the production of considerable amounts of biotechnological relevant metabolites. Improvements achieved in draft genomes enable a more confident genomic study of these species aiming to find genes involved in those promising pathways. Pipeline here presented and tested in complex genome assemblies of ca. 20 MB may be also of interest for the improvement of larger genomes.