IIB   20738
INSTITUTO DE INVESTIGACIONES BIOLOGICAS
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
Structure-Function Prediction of Highly Variable Sub-sequences of Protein Subfamilies
Autor/es:
REVUELTA MV; ARJEN TEN HAVE
Lugar:
Oro Verde
Reunión:
Congreso; 3rd Congress of the Asocación Argentina de Bioinformática y Biología Computaciona; 2012
Institución organizadora:
Asociación Argentina de la Bioinformática y Biología Computacionaĺ
Resumen:
BackgroundProtein families consist of homologous, often functionally related, proteins that have a similar 3D structure.Key aspect of protein families is that they contain paralogues, which allows for functional diversification andthe evolution of subfamilies. One of the aims of Structure-Function Prediction studies is the identification ofSubfamily or Specificity Determining Positions (SDPs), sites or residues specific for certain functionalaspects or subfamily classification. The identification of SDPs is a hot topic in Bioinformatics and can beachieved by various methods based on either evolutionary tracing (ET) or mutual information (MI), both ofwhich depend on multiple sequence alignments (MSAs) and homology. Interestingly, MSAs also identifysub-sequences that are not conserved throughout the complete superfamily and, hence, are not trulyhomologous. Current ET or MI SDP identification methods do not identify these Subfamily or SpecificityDetermining Sub-sequences (SDSs), some of which could be very important for protein function. We set outto develop methodology for the identification and subsequent analysis of SDSs using A1 AsparticProteinases (APs) as a case study. APs form a well studied protein family with a number of well described,functionally important loops such as the Nepenthesin-specfic loop and the Plant Specific Insert. The analysiswill be used for functional prediction but also for the foundation of a more general SDS-identification andanalysis procedure.ResultsA multiple sequence alignment of 710 AP sequences from 107 completely sequenced eukaryotic genomeswas constructed based on known hallmarks and available structural information. Non-homologous orotherwise poorly aligned sub-sequences were removed and a phylogenetic tree was constructed. The treeshows the existence of eleven different AP subfamilies whereas the MSA trimming identified 12 stretcheswith high variability. Six of these were described by Metcalf & Fusek (1993) as variable loops that arecovering the binding cleft, are rather mobile or distorted in structures and are supposedly involved insubstrate specificity. The other six SDSs are more remote form the binding cleft but also appear solventexposed.Once identified, the SDSs require bio-computational analysis. The sub-sequences were analyzed forlength, subfamily conservation and sequence characteristics. The length of each of the 12 highly variablesub-sequences was determined using a PERL script and analyzed in R in order to find significant differencesbetween subfamilies. Subfamily conservation was analysed by realignment of the 12 SDS regions for the 11identified subfamilies. Reliable alignments were obtained for some but not all 131 datasets. Comparison ofreliable cluster-specific SDS-alignments was hampered by a low information content. All sequences wereanalysed using a number of bio-computational methods in order to detect putative physicochemical and orbiological fingerprints.ConclusionMSA trimming software can be used for the identification of SDSs. Ten out of 12 SDSs identified in the APsuperamiliy show statistically significant differences throughout the superfamily classification. A number ofSDS-cluster alignments are reliable which suggest these SDSs are functionally constrained within certainsubfamilies. Other SDS-cluster alignments are not reliable and require a tree-guided iterative alignmentoptimization which is currently being developed. Comparison of SDSs is hampered by lack of clearhomology and alternative strategies are being developed for comparative analysis. Most SDSs are relativelyhydrophilic confirming that SDSs are solvent exposed. A number of Prosite patterns with a high probabilityof occurrence was identified and will be statistically analysed.Reference Metcalf P, Fusek M: T EMBO 1993, 12(4):1293-1302