INVESTIGADORES
TEN HAVE Arjen
congresos y reuniones científicas
Título:
Improved HMMERCTTER Classification Performance: A Secondary Cut-off Threshold for Reliable Protein Superfamily Classification.
Autor/es:
AMALFITANO, A; VERON, J; STOCCHI, N; BENAVENTE, M; ARJEN TEN HAVE; BRUN M
Lugar:
Praga
Reunión:
Congreso; ISMB/ECCB 2017; 2017
Institución organizadora:
International Society of Computational Biology
Resumen:
p { margin-bottom: 0.1in; direction: ltr; color: rgb(0, 0, 0); line-height: 120%; }p.western { font-family: "Liberation Serif","Times New Roman",serif; font-size: 12pt; }p.cjk { font-family: "Droid Sans Fallback","Times New Roman"; font-size: 12pt; }p.ctl { font-family: "FreeSans","Times New Roman"; font-size: 12pt; }BackgroundPfamand TIGRFAM are HMMER profile databases used in function assignationof complete proteomes. They use trusted thresholds rather than HMMERsoverly sensitive inclusion thresholds to increase specificity. Thisresults in reduced sensitivity. HMMER Cut-off Threshold Tool(HMMERCTTER) clusters a superfamily's trainingsequences into monophyletic clusters with 100% Precision & Recall(P&R), i.e. clusters that detect its members with higher HMMERscore than non-members. These are used to classify new sequenceswhile imposing 100% P&R to all groups. Classification is iteratedwhereto in each step the profile is updated by including acceptedsequences. Iteration stops when conflicting sequence identificationsoccur. Unfortunately, for certain complex or divergent superfamiliesthis results in poor coverage. In order to classify more sequences,we developed new approaches to add a second, less restrictive cut-offfor classification of sequences in the ?twilight zone? ofsimilarity. ResultsWeredefined the classification as a three step method. In the first,fully adaptive step, sequences are accepted one by one and bothprofile and threshold are updated until no more sequences with ascore above any threshold are detected. In the second semi-adaptivestep, we only modify the thresholds of the groups and maintain the100 P&R condition. This step terminates when only conflictingsequence identifications occur. Conflictingsequences are finally classified by means of distance measures thusachieve 100% coverage, at the cost of reducing the P&R.ConclusionsThefinal classification of HMMERCTTER willbe compared with final high fidelity trees. Results will be presentedand discussed.