IQUIBICEN   23947
INSTITUTO DE QUIMICA BIOLOGICA DE LA FACULTAD DE CIENCIAS EXACTAS Y NATURALES
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
Protein repeats from first principles
Autor/es:
TURJANSKI, PABLO; PARRA, R. GONZALO; ESPADA, ROCÍO; BECHER, VERÓNICA; FERREIRO, DIEGO
Lugar:
Buenos Aires
Reunión:
Workshop; Workshop Internacional Programa Raíces (MINCyT) - "La matemática como herramienta para entender la biología / la biología como fuente de problemas matemáticos"; 2015
Institución organizadora:
Instituto de Cálculo - FCEyN - UBA
Resumen:
AbstractProteins are a small portion of all the sequences that can be formed by combination of the 20 aminoacids. Although proteinsequences are indistinguishable from random aminoacidic chains in their composition, one distinctive feature of proteins isthat they are able to fold, into specific three-dimensional structures [1].There is a group of molecules inside the proteo universe whose aminoacid sequences are different from the random ones,since specific patterns are observable. These molecules, namely repeat proteins are composed of tandem copies of structuralmotifs of similar amino acid stretches, that usually fold up into elongated structures [2]. They are present in 14% of knownprotein sequences with specific functions generally associated to higher organisms and their pathogens [3]. Several repeatprotein families have been described, based on the occurrence of a elemental repetition of specific lengths and structuralcomposition, e.g. Ankyrins, Leucine Rich, Heat, TPR, Armadillo, Beta-Propellers, among others. The modular nature ofrepeat proteins is advantageous to dissect the sequences-structures-functions relationships, given that interactions withinthe polypeptidic chains remain local in space as in contrast to globular families such as globins or inmunoglobulins wherelong-range interactions lead to intricate topologies.If we focus on families, how are these families defined? What is a protein family? Although the definition of protein familiesis a major problem in molecular biology, their definition is based on sophisticated strategies that make use of subjectivedefinitions of substitution matrices, similarity functions, sequence alignments, hidden markov models and others. This non-objective way of defining what a protein family is, leads to fuzzy limits in between families. This is evident at the Pfamdatabase [4] where fine tuning for sequence detection parameters using ad-hoc profiles is needed to generate non overlappingclusters of sequences that constitute the families.In this work we mathematize the notion of repeat protein families. We start by giving a mathematical definition of arepeat relative to a set of sequences. Instead of the usual approaches that consider the quantification of occurrences inside agiven sequence, we consider a quantification relative to an explicitly defined set of sequences.The usual methods to detect repeats at repeat protein families, are based on alignments, mismatches, distance functionsalways using an implicit concept of a family of proteins in order to derive the parameters to be used in each case [5].Our contributions in this work are the following:? We provide a mathematical definition of a repeat relative to an explicitly given set. Thus, instead of considering thata protein repeat should have two or more occurrences inside a given sequence, we ask for one occurrence in the givensequence and at least another occurrence in any of the other members of the set. Repeats are blocks that occur withperfect matching, in an explicitly defined set of proteins.? Our definition allows a very efficient computational treatment. We compute repeats in time O(n log n), where n is thesize of the family set (amount of aminoacids). The same computational complexity applies to the quantification of thenumber and length of the repeats.? We formally define the concept of a protein family: A set X of proteins is a family if the sequence corresponding to eachprotein in X possesses the same quantification of repeats relative to the set X.? We propose a method to decide if an arbitrary protein belongs to a family.? We tested the notion using 173 target sequences from three repeat protein families (Ankyrins, Dehalogenase and WD)showing that they are repetitive not on their own but on their family context.? Our experiments show that all families, including the globular ones are repetitive under our concept, it just happensthat some are more repetitive than others.References[1] Donald B. Wetlaufer. Nucleation, rapid folding, and globular intrachain regions in proteins. Proceedings of the National Academy of Sciences,70(3):697?701, 1973.[2] Miguel A Andrade, Chris P Ponting, Toby J Gibson, and Peer Bork. Homology-based method for identification of protein repeats using statisticalsignificance estimates. Journal of Molecular Biology, 298(3):521 ? 537, 2000.[3] Edward M. Marcotte, Matteo Pellegrini, Todd O. Yeates, and David Eisenberg. A census of protein repeats. Journal of Molecular Biology,293(1):151 ? 160, 1999.[4] Alex Bateman, Lachlan Coin, Richard Durbin, Robert D. Finn, Volker Hollich, Sam GriffithsJones, Ajay Khanna, Mhairi Marshall, SimonMoxon, Erik L. L. Sonnhammer, David J. Studholme, Corin Yeats, and Sean R. Eddy. The pfam protein families database. Nucleic AcidsResearch, 32(suppl 1):D138?D141, 2004.[5] Hong Luo and Harm Nijveen. Understanding and identifying amino acid repeats. Briefings in Bioinformatics, 15(4):582?591, 2014.