IFAB   27864
INSTITUTO DE INVESTIGACIONES FORESTALES Y AGROPECUARIAS BARILOCHE
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
Complete chloroplastic and mitocondrial genomes of native tree species and strategies toward end-to-end chromosomal assembly
Autor/es:
ESTRAVIS-BARCALA, M.; GUTIERREZ,R; MOYANO, T; BELLORA, N; ARANA, M.V.
Reunión:
Congreso; Congreso Conjunto SAIB-SAMIGE 2020; 2020
Resumen:
Nothofagus pumilio (common name: lenga) is the most abundant tree of the southern temperate forests of Argentina and Chile. It constitutes a key ecological species, distributed across a wide latitudinal and altitudinal range. Despite its economic and ecological importance, genomic resources for N. pumilio and the whole genus are scarce. This bi-national iniciative aims at sequencing and assembling the complete genome of N. pumilio, which will be the first such resource for a native tree of Argentina and Chile. As a first step toward this goal, total DNA was extracted from buds collected from an individual in the Argentina-Chile border in Monte Tronador. Paired-end (PE) and mate-pair (MP) Illumina libraries with different read lengths (350 and 550 bp for PE, and 5 kb for MP) were constructed and sequenced. Each of the PE and MP libraries yielded around 40X coverage of the estimated haploid genome (790 Mb using flow citometry), and more than 88% of Illumina reads had a Phred score greater than 30. In order to assemble the organellar genomes, reads were mapped to reference plant cpDNA or mtDNA, and several reads were chosen as seeds. This strategy takes advantage of the high sequence conservation of cpDNA and mtDNA among plant species. Then, each genome was assembled separately by an overlap-extension method. In this work we present N. pumilio cpDNA (150,390 bp) and mtDNA (354,003 bp). Both genomes were annotated against reference species and feature all rRNA and tRNA genes, apart from all expected protein-coding genes found in most plant species. Moreover, the cpDNA has a typical structure of two inverted repeats (IRA and IRB) which separate a Long Single Copy section (LSC) and a Short Single Copy section (SSC). At the same time, a preliminary de novo total assembly was performed using Redundans (with all PE reads as input) and Opera (for scaffolding with MP reads). This assembly yielded 13,140 scaffolds longer than 2000 bp, adding to 360,574,575 bp. About 90% of PE and MP reads were used in the assembly. A total of 2326 eukaryotic BUSCOs were searched in the genome assembly, of which 2040 (87.7%) were complete, and only 168 (7.2%) missing. Moreover, 90% of reads from a previous N. pumilio transcriptomic study were uniquely mapped to the new assembly. However, the assembly is about half the expected size according to flow citometry. These results suggest that we were able to capture virtually all coding, high-complexity regions of the genome, but many repetitive or otherwise low-complexity regions are being collapsed or incorrectly assembled. In this work we discuss some possible future steps aiming at completing the genome assembly, mainly HiC for end-to-end chromosome structure and PacBio HiFi sequencing for repetitive regions resolution. These newly assembled genomes constitute the first genomic resources for N. pumilio and will be useful for population and biochemical studies in this species and its relatives.