RODRIGUEZ Gustavo Ruben
CleanBSequences: an efficient curator of biological sequences in R
POZZI, FLORENCIA I.; GREEN, GISELA Y.; BARBONA, IVANA G.; RODRÍGUEZ, GUSTAVO R.; FELITTI, SILVINA A.
MOLECULAR GENETICS AND GENOMICS
This work presents a new method and tool to solve a common problem of molecular biologists and geneticists who use molecular markers in their scientific research and developments: curation of sequences. Omic studies conducted by molecular biologists and geneticists usually involve the use of molecular markers. AFLP, cDNA-AFLP, and MSAP are examples of markers that render information at the genomics, transcriptomics, and epigenomics levels, respectively. These three types of molecular markers use adaptors that are the template for PCR amplification. The sequences of the adaptors have to be eliminated for the analysis of the results. Since a large number of sequences are usually obtained in these studies, this clean-up of the data could demand long time and work. To automate this work, an R package, named CleanBSequences, was created that allows the sequences to be curated massively, quickly, without errors and can be used offline. The curating is performed by aligning the forward and/or reverse primers or ends of cloning vectors with the sequences to be removed. After the alignment, new subsequences are generated without biological fragments not desired by the user, i.e., sequences needed by the techniques. In conclusion, the CleanBSequences tool facilitates the work of researchers, reducing time, effort, and working errors. Therefore, the present tool would respond to the problems related to the curation of sequences obtained from the use of some types of molecular markers. In addition to the above, being an open source, CleanBSequences is a flexible tool that has the potential to be used in future improvements to respond to new problems.