ICYTE   26279
INSTITUTO DE INVESTIGACIONES CIENTIFICAS Y TECNOLOGICAS EN ELECTRONICA
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
A secondary cutoff threshold for improved HMMERCTTER protein superfamily classification
Autor/es:
AGUSTÍN AMALFITANO; ARJEN TEN HAVE; JUAN VERON; MARCEL BRUN; NICOLAS STOCCHI
Lugar:
Praga
Reunión:
Conferencia; 25th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) and 16th European Conference on Computational Biology (ECCB); 2017
Institución organizadora:
International Society for Computational Biology (ISCB)
Resumen:
A Secondary Cutoff Threshold for Improved HMMERCTTER Protein Superfamily Classification.Amalfitano A, Veron J, Stocchi N, ten Have A, Benavente M, Brun MBackgroundPfam and TIGRFAM are HMMER profile databases for function assignation ofcomplete proteomes. They use trusted thresholds rather than HMMERssensitive thresholds to increase specificity, resulting in reducedsensitivity. HMMER Cutoff Threshold Tool (HMMERCTTER) clusters asuperfamily training sequences into monophyletic clusters with 100%Precision & Recall (P&R). These are used to classify new sequenceskeeping 100% P&R. Classification is iterated whereto in each step theprofile is updated by including accepted sequences. Unfortunately, forcertain complex or diverge superfamilies this results in poor coverage.ResultsIn order to increase HMMERCTTER coverage we developed a less restrictivecutoff for classification of sequences in the ?twilight zone? ofsimilarity. We redefined the classification stage as a three step method.In the first, fully adaptive step, sequences are processed one by one.Those that classify for only one group are added to that group, updatingboth profile and threshold, only if 100%P&R is maintained. In the second,semiadaptivestep, added sequences modify only the group threshold,still checking for 100 P&R. In the third, optional step, the unclassifiedsequences are finally classified, based on their scores, without changingthe profile and threshold of the groups, thus achieving 100% coverage, atthe cost of reducing P&R.Conclusions and perspectivesThis method is expected to improve considerably coverage. Results will becompared with the previous method on high fidelity datasets