INVESTIGADORES
RUYBAL Paula
congresos y reuniones científicas
Título:
MSLT-pipeline: a web-based workflow for the analysis of Multilocus Sequence Typing schemes
Autor/es:
PRINCIPI DARIO; DELFINO SANTIAGO; GUILLEMI ELIANA; RUYBAL PAULA; WILKOWSKY SILVINA; FARBER MARISA
Reunión:
Conferencia; 1st. International Conference of Bioinformatics SOIBIO; 2010
Resumen:
Multilocus sequence typing (MLST) is an unambiguous procedure for characterizing isolates of microorganism species using the sequences of internal fragments of seven house-keeping genes.  Approx. 450-500 bp internal fragments of each gene are used, as these can be accurately sequenced on both strands using an automated  DNA sequencer.   For each house-keeping gene, the different sequences present within a microorg anism species are assigne d as distinct alleles and, for each isolate, the alleles at each of the seven loci define the allelic profile or sequence type (ST) (1). MLST has been used successful ly to study population genetics and reconstruct micro-evolution of epidemic bacteria, fungus and protozoa.  Highly reproducible data together with the availability of low-cost sequence services, makes MLST a powerful tool.   However,  manually intensive steps of processing the raw sequence data files and the downstream analysis hampered the application of the methodology.  To overcome these  limitations we present MLST-pipeline, a web-based tool for dealing with data and end up with the proper ST. The pipeline accepts raw chromatogram trace files from both strands, named in a standardized way so as to identify the trace files that has to be assembly  together (forward and reverse strand). The system offer a first stage user-customize analysis tool including the following processes: base calling and cleaning (PHRED), (2), clustering and assembling (CAP3), (3). In  the following stage an alignment step is implemented, considering the se quence start site from a user-defined primer, and the sequence end site from user-defined  sequence length. Finally, ST assignment is performed by in-house developed  Perl scripts and is olate genotypes are stored in a MSQL database. In addition, the workflow is flexible enough to allow the user the manual loading of assembled sequences. The pipeline can be accessed at http://bioinformatica.inta.gov.ar/mlst_pipeline . Validation using bacterial ( Anaplasma marginale ) and protozoa (Babesia) dataset revealed complete agreem ent between the results generated by manual and automated workflows.  References Maiden, M.C.J.,  et. al.  Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc. Natl. Acad. Sci. USA,  95, 3140-3145, 1998. Ewing B, et. al.   Basecalling of automated sequencer traces using phred. I. Accuracy assessment . Genome Research 8:175-185 (1998).  Huang, X. and Madan, A. (1999) CAP3: A DNA sequence assembly program. Genome Res.,  9, 868-877. Acknowledgments This work has been supported by Projects:  INTA-AEBIO 245711 and ANPCyT, PICT 1634.