CONICET | Buscador de Institutos y Recursos Humanos

The prediction of protein structures only from the amino acid sequence is a long standing challenge of computational biology. In this procedure, coined as abinitio protein structure prediction, one usually starts withan extended protein chain, and tries to minimize an energy (or objective) function by gradually modifying the structure. During this process, most of the time is spent generating protein conformations (lacking any secondary structure). Clearly, a better starting point for abinitioprediction would be to know beforehand all (or an approximation to all) backbone conformations that a protein of the required size can acquire. In this case the problem can be reduced to picking the backbone that better adapts to the sequence under study from this repository, which is similar to the threading technique of structure prediction.The aim of this ongoing project is to generate a complete repository of protein backbones trough an exhaustive enumeration of the conformational space. The a priori number of NN possible conformations is reduced by applying geometric rules derived from a statistical analysis of the Protein Data Bank.Two software programs are being developed to address this goal: a program that massively generates backbones by performing a combinatorial approach under geometric constraints, and a clusterer program that groups structures by means of a Root Mean Square Deviation[0] (RMSD) disimilarity function in clusters and selects a representative backbone from each of these, thus reducing the database size.The backbones generation program is a distributed application running on an Message Passing Interface (MPI) middleware, implementing a collaborative loadbalancing approach. The clusterer program is an implementation of a parallel centerbased clustering algorithm[1] (similar to kmeans[2]) over this set of backbones which uses a cutoff parameter to determine the clusters while reducing the result of the RMSD comparison by rotating one of the structures to achieve optimal superimposition between them[3].