INVESTIGADORES
TEN HAVE Arjen
congresos y reuniones científicas
Título:
HMMER Performance Optimization for Protein Superfamily Classification with Reliable Cut-off
Autor/es:
AGUSTIN AMALFITANO; NICOLÁS STOCCHI; ARJEN TEN HAVE; BRUN M
Lugar:
BsAs
Reunión:
Exposicin; 4th International Society of Computational Biology-Latin America Conference; 2016
Institución organizadora:
ISCB/A2B2C
Resumen:
p { margin-bottom: 0.1in; direction: ltr; color: rgb(0, 0, 0); line-height: 120%; text-align: justify; }p.western { font-family: "Times New Roman",serif; font-size: 11pt; }p.cjk { font-family: "Droid Sans Fallback","Times New Roman"; font-size: 11pt; }p.ctl { font-family: "Times New Roman",serif; font-size: 11pt; }BackgroundHMMERdetects homologs from sequence datasets and is used in functionassignation of complete proteomes. However, being designed for highsensitivity, the use of inclusionthresholdsselects non-homologs. Pfam and TIGRFAM use trustedthresholdsto prevent low specificity, which results in reduced sensitivity.Based on phylogeny, HMMERCut-off Threshold Tool(HMMERCTTER) clusters superfamily training sequences into clusterswith 100% Precision & Recall (P&R). These are used toclassify target sequences iteratively while updating the profileswith novel sequences accepted at 100% P&R. Unfortunately,coverage can be poor, which depends on superfamily complexity and theHMMER scoring system. We tested the effect of HMMER settings onHMMERCTTER performance and investigated the possibility of includingcolumn weighting, based on the fact that cluster determiningpositions (CDPs) contribute significantly to clustering.ResultsUsinga complex case with many clusters with poor P&R andclassification optimization as objective, we tested hmmbuildsettings using HMMER score for classification and ROC/AUC asperformance criteria. In addition we defined measures to analyze thesubfamily sequence-proximity (compactness) as well as the distancebetween subfamilies (separateness). Surprisingly, changes in theBLOSUM setting has erratic effects on performance and in at least onecase default BLOSUM62 lead to a 40% AUC drop. Furthermore, we willshow the proof of principle of column weighting.ConclusionsHMMERsettings can have a severe impact on score, which affects P&R ofHMMERCTTER. We will discuss how BLOSUM settings and column weighting,which is not part of HMMER, might be implemented in HMMERCTTER.Supportedby CONICET and AGENCIA.