INVESTIGADORES
TEN HAVE Arjen
congresos y reuniones científicas
Título:
SwissProt Select: The New Protein Superfamily Database for Reliable Function Assignation.
Autor/es:
STOCCHI, N; AMALFITANO, A; BRUN M; ARJEN TEN HAVE
Lugar:
Buenos Aires
Reunión:
Conferencia; 4th International Society of Computational Biology-Latin America Conference; 2016
Institución organizadora:
International Society of Computational Biology
Resumen:
p { margin-bottom: 0.1in; direction: ltr; color: rgb(0, 0, 0); line-height: 120%; }p.western { font-family: "Liberation Serif","Times New Roman",serif; font-size: 12pt; }p.cjk { font-family: "Droid Sans Fallback"; font-size: 12pt; }p.ctl { font-family: "FreeSans"; font-size: 12pt; }BackgroundHMMERCTTER identifies and classifiesthe members of a superfamily of proteins with 100% precision andrecall (P&R) based on a representative training set of sequences,leaving some sequences unclassifiable as orphans. The precision isthe fraction of recovered sequences that are relevant, while therecall is the fraction of all sequences that are pertinents. Usingmultiple filters on the SwissProt database and a modified version ofHMMERCTTER to measure the P&R percentage of several families atthe same time, we create Swissprot Select. Here we report theautomatic identification and classification of superfamilies, basedon annotation levels and multiple filters, included P&R.ResultsThe application of the initialfilters result in 69561 proteins grouped in 10139 families, of which8061 superfamilies were 100% P&R conforming the first version ofSwissprot Select. For the later versions, two cleaning methods wereused for the 2078 NO 100% P&R families: Identify therelationships between families and join them as superfamilies, withcertain threshold; identify bad set of sequences that do not belongto the family and they should be removed. With the iterativeautomatic application of these methods (in the order given) and asubsequent analysis with HMMERCTTER, XX new superfamilies wereobtained with 100% P&R, with ~ 95% coverage of the data. Whereasthe families that could not be recovered, form another part of thedatabase, called Swissprot Orphans. ConclusionThis study is the first step in alarger one to generate the database which it is necessary to makefurther improvements to HMMERCTTER.