CONICET | Buscador de Institutos y Recursos Humanos

CUBES and CUBACR, script packages for the analysis of evolutionary traits of codon bias.Mauricio Javier Lozano 1 *, José Luis López 1 , María Laura Fabre 1 , and Antonio Lagares 1IBBM - Instituto de Biotecnología y Biología Molecular, CONICET, CCT-La Plata,Departamento de Ciencias Biológicas, Facultad de Ciencias Exactas, Universidad Nacional deLa Plata, calles 47 y 115, 1900-La Plata, Argentina.Key words: genomics, core genes, singletons, plasmidome, codon usage.* Corresponding Author.BACKGROUNDThe balance between mutational biases and natural selection generates a wide range of GC contentsand codon usage biases in Prokaryote genomes. Synonymous codons are selected to optimizetranslation of genes with different functions and expression levels, and to preserve the fitness of thecell. The choice of such optimal codons produce intragenomic codon usage heterogeneities. Theanalysis of core-gene sets with increasing ancestries in bacteria, revealed an increased degree ofadaptation of the most ancestral genes to the translational machinery. This adaptation could be aconsequence of codon selection for better translation accuracy or efficiency. One way to attemptdifferentiating those effects is to compare the codon usage of conserved vs. non-conserved geneticregions. Here we present bioinformatic tools which can perform several codon usage analysis on sets ofgenes belonging to different core-genomes.RESULTSWe present CUBES and CUBACR scripts. CUBES, is a set of scripts written in bash, perl and R;which can be used to calculate the modal codon usage frequencies for sets of progressively moreancestral core-genomes, and the correspondence analysis of relative synonymous codon usage (RSCU).Additional scripts on these package can be used to calculate the adaptation index s-tAI, the GC3content, and to generate a distance tree based on the tRNA gene content. Finally, two plots, oneshowing the change in the codon use frequencies (CUF) for each codon and the corresponding absoluteadaptiveness (w), and the other, a histogram of the difference in CUF for the initial and the mostancestral cores and the putative highly expressed genes (PHE) are created. CUBACR, a complementarypackage, contains scripts to analyze the codon usage of conserved and variable regions for highly andlowly expressed genes supplied by the user.CONCLUSIONSThe scripts presented here provide a way to massively analyze the evolutionary traits of codon usage ofcore-gene sets with increasing ancestries. Such analysis can help to understand how specific biaseshave operated to improve translation, and in what amount selection for efficiency, accuracy or both, areshaping codon usage in the prokaryote tree of life.