IICAR   25568
INSTITUTO DE INVESTIGACIONES EN CIENCIAS AGRARIAS DE ROSARIO
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
DEVELOPMENT OF NEW FUNCTION TO IMPROVE AND UPDATE THE R PACKAGE: CleanBSequences.
Autor/es:
POZZI, FLORENCIA I.; FELITTI, SILVINA A.
Lugar:
Buenos Aires
Reunión:
Congreso; 1st Latin American Congress of Women in Bioinformatics and Data Science.; 2020
Resumen:
Omic studies conducted by molecular biologists and geneticists usually involve the use of molecular markers. AFLP, cDNA-AFLP and MSAP are examples of markers that render information at the genomics, transcriptomics and epigenomics levels, respec-tively. These three molecular markers involve the use of adaptors that are the template for PCR amplification. The sequences of the adaptors have to be eliminated for the analysis of the results. In these studies, a large number of sequences are usually ob-tained. Therefore, the clean-up of the data could demand long time and work. To auto-mate this work, an R package, named CleanBSequences, was created that allows the sequences to be curated massively, quickly, without errors and can be used offline. The curating is performed by aligning the forward and reverse primers or ends of cloning vectors with the sequences to be removed. After the alignment, new subsequences are generated without the fragment not desired by the user, i.e. sequences needed by the techniques. The use of the package in the curating sequences from the cDNA-AFLP technique allowed to detect flaws and possible improvements to apply to the package in order to generate a new version by updating the current version available in CRAN. The objective of this work was the development of improvements in one of the functions of the package: "TwoPrimerRemove". As a result, using R software, a new function was developed: ?TPRDNAString?, superior to the previous one. This new function not only allowed the cleaning of sequences obtained by cDNA-AFLP in a free, correct, and au-tomated way, but also contemplates possible sequencing errors, incorporating miss-matchs to the alignment (no longer required a 100% match for positive alignment). In addition, this new function allows to visualize the alignment between the primers and the DNA sequence and save the results in a FASTA file for subsequent analysis.