INVESTIGADORES
TEN HAVE Arjen
congresos y reuniones científicas
Título:
The alignment of protein superfamily sequences Part I: Identifying Cluster Specific Subsequences (CSS) in Protein Families.
Autor/es:
REVUELTA MV; ORTS, F; PAGNUCO IA; BRUN M; ARJEN TEN HAVE
Lugar:
Bahía Blanca
Reunión:
Congreso; 6to Congreso Argentino de Bioinformática y Biología Computacional; 2015
Institución organizadora:
Asociacion Argentina de Bioinformática y Biología Computacional
Resumen:
p { margin-bottom: 0.1in; direction: ltr; color: rgb(0, 0, 0); line-height: 120%; text-align: left; }p.western { font-family: "Liberation Serif","Times New Roman",serif; font-size: 12pt; }p.cjk { font-family: "Droid Sans Fallback"; font-size: 12pt; }p.ctl { font-family: "FreeSans"; font-size: 12pt; }a:link { }BackgroundMultipleSequence Alignments (MSA) are crucial tools in proteinbioinformatics. Despite recent advances such as obtained by MAFFT andPromals3D, MSA construction of complex superfamilies remainsproblematic and requires rigorous manual correction. Here we presenttwo posters that describe our attempts to develop a protocol and,subsequently, a software that is directed at aligning many (>500)sequences of complex protein superfamilies.Partof the problems is formed by the presence of ClusterSpecific Subsequences (CSS) which are defined as a subsequence thatis present in a subfamily but absent in the superfamily. Theobjective of this work was to design and test a pipeline to addressthe CSS problem and to identify functionally important regions inprotein subfamilies. ResultsWedesigned and are currently testing a CSS detection pipeline in twodifferent protein family datasets: Aspartic Proteinases andSedolisins. The basis is building a null model that consists in alldeltaIC values for 10000 N-sized samples of random sequences of agiven alignment. DeltaIC scores are calculated as the mean IC(Information Content) for a given window, before and afterrealigning. Those regions that show a high improvement are putativeCSS, and when compared with the null model, should appear in the 95%or 99% tail of the distribution. Conserved regions will not likelyshow IC differences, since they are maintained across the whole(super)family's MSA. For the Aspartic Proteinases set, preliminaryanalyses show that the pipeline identifies several CSS that have beenalready described [1] and a new CSS is confirmed for an AP subfamilythat consists of fungal phytopathogen sequences only. The Sedolisinscase is under study at the moment.ConclusionDeltaICscore is a valuable tool for detecting cluster specific subsequencesin protein superfamily MSAs. Improvements should be addressed toselecting and testing IC formulas and choosing the right clustering. References[1] Metcalf P, Fusek M. Two crystal structures for cathepsin D:the lysosomal targeting signal and active site. EMBO J.1993. Apr;12(4):1293-1302.