INVESTIGADORES
TEN HAVE Arjen
congresos y reuniones científicas
Título:
SwissProtCluster: The New Protein Superfamily Database for Reliable Function Assignation by HMMERCTTER
Autor/es:
STOCCHI, N; AMALFITANO, A; ARJEN TEN HAVE; BRUN M
Lugar:
Praga
Reunión:
Congreso; ISMB/ECCB 2017; 2017
Institución organizadora:
International Society of Computational Biology
Resumen:
BackgroundHMMER databases, like Pfam, are used for sequence function assignation. They usetrusted cut-offs to obtain specificity at the cost of reduced sensitivity. HMMER Cut-offThreshold Tool (HMMERCTTER) consists of HMMERCTTER_Clust that identifiesmonophyletic clusters with 100% precision and recall (P&R), i.e. clusters that identify allcluster-sequences with higher scores than non-cluster-sequences. HMMERCTTER_Classthen classifies target-sequences using the identified clusters. Also, HMMERCTTER_Classcan use any sequence clustering with only 100% P&R clusters. Therefore, we developed a100% P&R HMMER-cluster database based on UniProTKB-SwissProt, providing a reliabletool for function assignation of complete proteomes. Here we report the construction of thesingle-domain database.ResultsSingle-domain sequences were grouped based on family annotation codes and tested for100% P&R. SwissProtCluster_1D.v1 contains 4143 groups of at least four sequences,totaling 69518 sequences, as well as 5871 ungrouped sequences. 3853 groups show100% P&R, the remaining 290 groups were scrutinized by a script that removes outliersuntil the group is 100% P&R. Ungrouped sequences were clustered into new groups usinga combination of CD-Hit and HMMERCTTER. SwissProtCluster_1D.v2 covers 86% of theUniProTKB-SwissProt single-domain sequence space. Sequences from small groups(n<4) were clustered using homologs from UniProt_RP15.ConclusionsUniProTKB-SwissProt contains sequences with incorrect family codes and protein familiesthat are described by more than a single protein family code. Performance will be testedby a comparison with Pfam using UniProt_RP75 as benchmark.