CONICET | Buscador de Institutos y Recursos Humanos

MSLT-pipeline: a web-based workflow for the analysis of Multilocus Sequence Typing schemes. D. Perez Principi1, S. Delfino1, E. Guillemi1, P. Ruybal1, S. Wilkowsky1, M. Farber1 1Instituto de Biotecnología, INTA, N. Repeto y Las Cabañas S/N, Hurlingham, Buenos Aires, Argentina mfarber@cnia.inta.gov.ar Multilocus sequence typing (MLST) is an unambiguous procedure for characterizing isolates of microorganism species using the sequences of internal fragments of seven house-keeping genes. Approx. 450-500 bp internal fragments of each gene are used, as these can be accurately sequenced on both strands using an automated DNA sequencer. For each house-keeping gene, the different sequences present within a microorganism species are assigned as distinct alleles and, for each isolate, the alleles at each of the seven loci define the allelic profile or sequence type (ST) (1). MLST has been used successfully to study population genetics and reconstruct micro-evolution of epidemic bacteria, fungus and protozoa. Highly reproducible data together with the availability of low-cost sequence services, makes MLST a powerful tool. However, manually intensive steps of processing the raw sequence data files and the downstream analysis hampered the application of the methodology. To overcome these limitations we present MLST-pipeline, a web-based tool for dealing with data and end up with the proper ST. The pipeline accepts raw chromatogram trace files from both strands, named in a standardized way so as to identify the trace files that has to be assembly together (forward and reverse strand). The system offer a first stage user-customize analysis tool including the following processes: base calling and cleaning (PHRED), (2), clustering and assembling (CAP3), (3). In the following stage an alignment step is implemented, considering the sequence start site from a user-defined primer, and the sequence end site from user-defined sequence length. Finally, ST assignment is performed by in-house developed Perl scripts and isolate genotypes are stored in a MSQL database. In addition, the workflow is flexible enough to allow the user the manual loading of assembled sequences. The pipeline can be accessed at http://bioinformatica.inta.gov.ar/mlst_pipeline. Validation using bacterial (Anaplasma marginale) and protozoa (Babesia) dataset revealed complete agreement between the results generated by manual and automated workflows. Anaplasma marginale) and protozoa (Babesia) dataset revealed complete agreement between the results generated by manual and automated workflows. References Maiden, M.C.J., et. al. Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc. Natl. Acad. Sci. USA, 95, 3140-3145, 1998. Ewing B, et. al. Basecalling of automated sequencer traces using phred. I. Accuracy assessment. Genome Research 8:175-185 (1998). Huang, X. and Madan, A. (1999) CAP3: A DNA sequence assembly program. Genome Res., 9, 868-877. Huang, X. and Madan, A. (1999) CAP3: A DNA sequence assembly program. Genome Res., 9, 868-877. Ewing B, et. al. Basecalling of automated sequencer traces using phred. I. Accuracy assessment. Genome Research 8:175-185 (1998). Huang, X. and Madan, A. (1999) CAP3: A DNA sequence assembly program. Genome Res., 9, 868-877. Huang, X. and Madan, A. (1999) CAP3: A DNA sequence assembly program. Genome Res., 9, 868-877. et. al. Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc. Natl. Acad. Sci. USA, 95, 3140-3145, 1998. Ewing B, et. al. Basecalling of automated sequencer traces using phred. I. Accuracy assessment. Genome Research 8:175-185 (1998). Huang, X. and Madan, A. (1999) CAP3: A DNA sequence assembly program. Genome Res., 9, 868-877. Huang, X. and Madan, A. (1999) CAP3: A DNA sequence assembly program. Genome Res., 9, 868-877. et. al. Basecalling of automated sequencer traces using phred. I. Accuracy assessment. Genome Research 8:175-185 (1998). Huang, X. and Madan, A. (1999) CAP3: A DNA sequence assembly program. Genome Res., 9, 868-877. Genome Res., 9, 868-877. Acknowledgments This work has been supported by Projects: INTA-AEBIO 245711 and ANPCyT, PICT 1634. 1Instituto de Biotecnología, INTA, N. Repeto y Las Cabañas S/N, Hurlingham, Buenos Aires, Argentina mfarber@cnia.inta.gov.ar Multilocus sequence typing (MLST) is an unambiguous procedure for characterizing isolates of microorganism species using the sequences of internal fragments of seven house-keeping genes. Approx. 450-500 bp internal fragments of each gene are used, as these can be accurately sequenced on both strands using an automated DNA sequencer. For each house-keeping gene, the different sequences present within a microorganism species are assigned as distinct alleles and, for each isolate, the alleles at each of the seven loci define the allelic profile or sequence type (ST) (1). MLST has been used successfully to study population genetics and reconstruct micro-evolution of epidemic bacteria, fungus and protozoa. Highly reproducible data together with the availability of low-cost sequence services, makes MLST a powerful tool. However, manually intensive steps of processing the raw sequence data files and the downstream analysis hampered the application of the methodology. To overcome these limitations we present MLST-pipeline, a web-based tool for dealing with data and end up with the proper ST. The pipeline accepts raw chromatogram trace files from both strands, named in a standardized way so as to identify the trace files that has to be assembly together (forward and reverse strand). The system offer a first stage user-customize analysis tool including the following processes: base calling and cleaning (PHRED), (2), clustering and assembling (CAP3), (3). In the following stage an alignment step is implemented, considering the sequence start site from a user-defined primer, and the sequence end site from user-defined sequence length. Finally, ST assignment is performed by in-house developed Perl scripts and isolate genotypes are stored in a MSQL database. In addition, the workflow is flexible enough to allow the user the manual loading of assembled sequences. The pipeline can be accessed at http://bioinformatica.inta.gov.ar/mlst_pipeline. Validation using bacterial (Anaplasma marginale) and protozoa (Babesia) dataset revealed complete agreement between the results generated by manual and automated workflows. Anaplasma marginale) and protozoa (Babesia) dataset revealed complete agreement between the results generated by manual and automated workflows. References Maiden, M.C.J., et. al. Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc. Natl. Acad. Sci. USA, 95, 3140-3145, 1998. Ewing B, et. al. Basecalling of automated sequencer traces using phred. I. Accuracy assessment. Genome Research 8:175-185 (1998). Huang, X. and Madan, A. (1999) CAP3: A DNA sequence assembly program. Genome Res., 9, 868-877. Huang, X. and Madan, A. (1999) CAP3: A DNA sequence assembly program. Genome Res., 9, 868-877. Ewing B, et. al. Basecalling of automated sequencer traces using phred. I. Accuracy assessment. Genome Research 8:175-185 (1998). Huang, X. and Madan, A. (1999) CAP3: A DNA sequence assembly program. Genome Res., 9, 868-877. Huang, X. and Madan, A. (1999) CAP3: A DNA sequence assembly program. Genome Res., 9, 868-877. et. al. Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc. Natl. Acad. Sci. USA, 95, 3140-3145, 1998. Ewing B, et. al. Basecalling of automated sequencer traces using phred. I. Accuracy assessment. Genome Research 8:175-185 (1998). Huang, X. and Madan, A. (1999) CAP3: A DNA sequence assembly program. Genome Res., 9, 868-877. Huang, X. and Madan, A. (1999) CAP3: A DNA sequence assembly program. Genome Res., 9, 868-877. et. al. Basecalling of automated sequencer traces using phred. I. Accuracy assessment. Genome Research 8:175-185 (1998). Huang, X. and Madan, A. (1999) CAP3: A DNA sequence assembly program. Genome Res., 9, 868-877. Genome Res., 9, 868-877. Acknowledgments This work has been supported by Projects: INTA-AEBIO 245711 and ANPCyT, PICT 1634. 1Instituto de Biotecnología, INTA, N. Repeto y Las Cabañas S/N, Hurlingham, Buenos Aires, Argentina mfarber@cnia.inta.gov.ar Multilocus sequence typing (MLST) is an unambiguous procedure for characterizing isolates of microorganism species using the sequences of internal fragments of seven house-keeping genes. Approx. 450-500 bp internal fragments of each gene are used, as these can be accurately sequenced on both strands using an automated DNA sequencer. For each house-keeping gene, the different sequences present within a microorganism species are assigned as distinct alleles and, for each isolate, the alleles at each of the seven loci define the allelic profile or sequence type (ST) (1). MLST has been used successfully to study population genetics and reconstruct micro-evolution of epidemic bacteria, fungus and protozoa. Highly reproducible data together with the availability of low-cost sequence services, makes MLST a powerful tool. However, manually intensive steps of processing the raw sequence data files and the downstream analysis hampered the application of the methodology. To overcome these limitations we present MLST-pipeline, a web-based tool for dealing with data and end up with the proper ST. The pipeline accepts raw chromatogram trace files from both strands, named in a standardized way so as to identify the trace files that has to be assembly together (forward and reverse strand). The system offer a first stage user-customize analysis tool including the following processes: base calling and cleaning (PHRED), (2), clustering and assembling (CAP3), (3). In the following stage an alignment step is implemented, considering the sequence start site from a user-defined primer, and the sequence end site from user-defined sequence length. Finally, ST assignment is performed by in-house developed Perl scripts and isolate genotypes are stored in a MSQL database. In addition, the workflow is flexible enough to allow the user the manual loading of assembled sequences. The pipeline can be accessed at http://bioinformatica.inta.gov.ar/mlst_pipeline. Validation using bacterial (Anaplasma marginale) and protozoa (Babesia) dataset revealed complete agreement between the results generated by manual and automated workflows. Anaplasma marginale) and protozoa (Babesia) dataset revealed complete agreement between the results generated by manual and automated workflows. References Maiden, M.C.J., et. al. Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc. Natl. Acad. Sci. USA, 95, 3140-3145, 1998. Ewing B, et. al. Basecalling of automated sequencer traces using phred. I. Accuracy assessment. Genome Research 8:175-185 (1998). Huang, X. and Madan, A. (1999) CAP3: A DNA sequence assembly program. Genome Res., 9, 868-877. Huang, X. and Madan, A. (1999) CAP3: A DNA sequence assembly program. Genome Res., 9, 868-877. Ewing B, et. al. Basecalling of automated sequencer traces using phred. I. Accuracy assessment. Genome Research 8:175-185 (1998). Huang, X. and Madan, A. (1999) CAP3: A DNA sequence assembly program. Genome Res., 9, 868-877. Huang, X. and Madan, A. (1999) CAP3: A DNA sequence assembly program. Genome Res., 9, 868-877. et. al. Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc. Natl. Acad. Sci. USA, 95, 3140-3145, 1998. Ewing B, et. al. Basecalling of automated sequencer traces using phred. I. Accuracy assessment. Genome Research 8:175-185 (1998). Huang, X. and Madan, A. (1999) CAP3: A DNA sequence assembly program. Genome Res., 9, 868-877. Huang, X. and Madan, A. (1999) CAP3: A DNA sequence assembly program. Genome Res., 9, 868-877. et. al. Basecalling of automated sequencer traces using phred. I. Accuracy assessment. Genome Research 8:175-185 (1998). Huang, X. and Madan, A. (1999) CAP3: A DNA sequence assembly program. Genome Res., 9, 868-877. Genome Res., 9, 868-877. Acknowledgments This work has been supported by Projects: INTA-AEBIO 245711 and ANPCyT, PICT 1634.

enviar mensaje