IIB   20738
INSTITUTO DE INVESTIGACIONES BIOLOGICAS
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
HMMer CTTer: Function assignation using HMMer profiles and reliable cut offs
Autor/es:
BONDINO HG; REVUELTA MV; BRUN M; ARJEN TEN HAVE
Lugar:
Córdoba
Reunión:
Congreso; 2nd Congress of the Asocación Argentina de Bioinformática y Biología Computacional; 2011
Institución organizadora:
A2B2C
Resumen:
BackgroundFunction assignation of proteins based on their coding sequence is a major challenge of genomics research. Similarity based tools like BLAST are impeded by mathematical limits of detection, hence, modules to increase either specificity (PHI-BLAST) or sensitivity (PSI-BLAST) have been developed. These are difficult to combine and PSI-BLAST is very sensitive to bad seeds. Pfam and other profile alignment collections provide tools for a fast and sensitive screening of genomic datasets. Function assignation based on Pfam data is however strongly impeded by the biological aspects of homology, orthology and paralogy. Furthermore, even although PfamA profiles are based on high quality annotated datasets, often PfamA searches yield false positives whereas certain positive sequences are missed. Specially when working with superfamilies, false positives and negatives impede a fast and robust identification of sequences of interest and therewith impede in silico function assignation. It is therefore important to develop new tools or procedures to improve function assignations.ResultsA phylogenetic analysis of plant alpha crystalline domain proteins was used as initial dataset. HMMer profiles were constructed and used for the screening of the 17 plant proteomes, finding 29 classes. Previous publications found only 9 classes. A sudden drop in Score is used to indicate the "subfamily-threshold". In this way, out of the 824 protein coding sequences (PCS) only one PCS yielded a false positive. No real positive PCSs were missed. This dataset was then used for:1) The development of variables useful for the prediction of HMMer profile behaviour when confronted to different datasets.2) The development of more sensitive HMMer profiles based on the inclusion of predicted ancestral sequences.3) The testing of the procedure on other superfamilies such as the UDP-Glucosyl Transferase and Pepsin-like Aspartic Protease, in order to validate the approach.ConclusionsWe have started the development of a flexible supervised procedure, referred to as the Cut-off Threshold Tool (CTTer), that based on a proper classification and the construction of class-specific HMMer profiles renders a specific and sensitive tool for fast identification and classification of protein coding sequences from large genome databases.