INVESTIGADORES
SOMOZA Gustavo Manuel
congresos y reuniones científicas
Título:
Phylogenetic context, whole genome sequencing, assembly and annotation of a new model species with Temperature-dependent Sex Determination.
Autor/es:
DANIELA CAMPANELLA; ELISABET CALER; JASON MILLER; HERNÁN LORENZI; JUAN I. FERNANDINO; NICOLE VALENZUELA; GUSTAVO M. SOMOZA; GUILLERMO ORTÍ
Lugar:
Shenzhen
Reunión:
Congreso; 8th International Conference on Genomics; 2013
Resumen:
Objectives To obtain a complete genomic sequence, assembly, and annotation of the genome of pejerrey (Odontesthes bonariensis, Atherinopsidae), a fish with temperature-dependent sex determination (TSD). To perform comparative genomic analyses with closely related species (Oryzias latipes, medaka) to identify conserved features and synteny regions, and with other species (fishes and turtles) that also exhibit TSD, to gain insight on the evolution of candidate genes associated with this trait. Methods Genomic DNA was obtained from a male specimen from an inbred line of pejerrey to construct two paired-end libraries (200 bp and 300 bp fragments) and a mate-pair library (3Kbp fragments) for shotgun sequencing with the Illumina platform. Denovo assemblies were performed with SOAP de novo v.1.05 and AllPaths-LG. MUMmer and DAGchainer analyses against medaka chromosomes were performed to assess conservation and synteny. Structural and functional annotations of the draft assembly were performed using JCVI pipelines to identify genes (Genezilla, Augustus, SNAP) and paralogous families. Comparative genomic analyses with other fish species that exhibit some TSD (Tongue Sole, Cynoglossus semilaevis) or no TSD (medaka), and with a turtle species with TSD (Chrysemis picta) are underway to analyze candidate genes associated with sex determination pathways. Phylogenetic analyses of these genes are being conducted to assess the degree of convergence or independent evolution of functional and structural domains. 3. Results A total of 1005.8 million reads were obtained for the three libraries combined of which 37% were used in the final assembly, with the largest fraction contributed by the 200 bp library (43%). The best denovo assembly was obtained with AllPaths-LG, with a total length of 870 Mb grouped in 31,274 scaffolds (107,428 contigs), with an N50 of 60,945 and an estimated genome coverage of 40X. The genome size estimation provided by AllPaths-LG was 998 Mb. PROmer alignments of each medaka chromosome (ASM31367v1) with the pejerrey scaffolds revealed a high degree of genome conservation between these species, showing that all 24 medaka chromosomes have high (close to 100%) similarity regions with pejerrey. In the pejerrey genome we predicted 5,1846 genes. We identified 2,988 gene families, that included 16,8% of the known pfam domains and 130 novel domains. The most expanded gene families included reverse transcriptases, protein kinases and transmembrane receptor 7. When transposable elements were excluded, other gene families with high representation include integrases, Myb/SANT-like DNA-binding domains, and several zinc finger motifs. Some genes families had members that are possibly related with TSD (FgF, 3-beta HSD) and the degree of expansion detected varied greatly. Candidate genes for sex determination pathways compared among genomes included aromatase, Sf1, Wt1, Sox9, and Dax. 4. Conclusion The preliminary draft assembly obtained only from short reads produced excellent results considering the modest amount of data used. Improved assembler technology available for short reads is a key resource to analyze complex genomes, suggesting that future efforts of this nature can contribute significantly to grow genomic databases. Complementary evidence (transcriptomics, linkage mapping) are necessary for validation, but comparative genomic analyses suggest that optimal results may not be difficult to achieve. Analyses of candidate genes for TSD are underway and have already identified important components of the genomic underpinning for sex determination in vertebrates.