CONICET | Buscador de Institutos y Recursos Humanos

Background Repeat proteins are made up of tandem arrays of structural motifs of 20~40 amino acid stretches that pack in a linear fashion to produce elongated, one dimensional architectures. These proteins exhibit continuous hydrophobic cores and extended solventaccessible surfaces. In contrast to globular proteins, repeat proteins are solely stabilized by contacts within repeats or between adjacent ones with no direct contacts between distant parts of the polypeptide chain. Due to their topological simplicity, these proteins represent a useful model to study the relationships among protein folding, dynamics and function. The ankyrin repeat is one of the most widely existing protein motifs in nature, consisting of ~33 amino acid residues. Proteins that are composed of tandem copies of this motif constitute the Ankyrin Repeat Protein Family. Ankyrin Repeat Proteins (ARPs) are widely distributed in nature. Their ?biological function? is usually attributed as mediating specific proteinprotein interactions with versatility for recognition paralleled to that of antibodies. Methods We have built a relational database for ARPs in which we store all the available sequences and structures along with several information from other sources. We have calculated different structural and sequence parameters on these proteins and curated experimental data for protein folding at equilibrium from public databases and bibliography. Due to the high divergence among ankyrin repeats, the definition of the repeating unit from sequence is not a trivial problem and as a consequence of this, many of these proteins are not completely annotated in domain databases such as PFAM and SCOP. In a previous work, we have developed an algorithm in order to analyze periodicities in protein structures and to define structural repetitions. Now that we have annotated all the repetitions within all the available ARPs structures we have calculated several measures both from sequences and structures and analyzed how they correlate among them and with the behaviours observed in different members of this family. Conclusions We have gathered together information from different and heterogeneous sources regarding ARPs and put it into a relational database in order to facilitate the exploration and mining of this data. We have annotated all the repetitions within the available ARPs structures using a structured based algorithm and increased the coverage of repeat annotation derived from sequencebased methods. We performed a structural analysis of this protein family and explored the contributions of different structural and sequence measures to individual repeats, the array of repeats and the polypeptide sequences.