BECAS
FENOY Luis Emilio
congresos y reuniones científicas
Título:
NetPhosPan: a pan specific predictor for phosphorylation site prediction
Autor/es:
EMILIO FENOY; MORTEN NIELSEN
Lugar:
Capital Federal
Reunión:
Congreso; ISCB-Latin America Conference 2016 y 7mo CA2BC; 2016
Institución organizadora:
International Society for Computational Biology (ISCB) and Asociación Argentina de Bioinformática y Biología Computacional (A2B2C)
Resumen:
Posttranslational modifications, like phosphorylation, are a common mechanism to control the dynamic behavior and decision process of eukaryotic cells. Of the 23, 000 proteins encoded by the humangenome, two-thirds have been demonstrated to be phosphorylated.Decades of studies and the recent use of high throughput experimentsemploying mass spectrometry have identified thousands of in vivophosphorylation sites. This information is available trough publicdatabases like Phospho.ELM and Phosphosite, yet only a fraction ofthese phosphorylation events have been attributed to a specifickinase. This fact, and the highly variable structure of the kinaseshave challenged previous attempts to develop accurate predictors.Although the variations in sequence and structure, most of thekinases feature a single highly related catalytic domain, withconserved regions that interact directly with the side chains of theaminoacid sequences surrounding the phosphosite. In this work, we developed a pan-specific method of phosphorylation site prediction based on artificial neural networks, exploring different approaches to improve its accuracy. We first developed a phosphosite prediction tool, based on ArtificialNeural Networks (ANN), trained on phosphosite sequences obtained from several public databases. This data was enriched with informationabout chemical and structural features previously identified asrelevant. The resulting tool improved the performance of existingmethods of phosphosite prediction. We next extended the method to bepan-specific implementing a pseudo-sequence allowing predictions forall kinases and improving its performance, including part of thesequence of the kinase´s catalytic domain in the training input. Thispseudo sequence is constructed from positions that are conservedamong the majority of kinase sequences and could play a key role inthe recognition and phosphorylation of the given peptide. This kindof approach has demonstrated to be highly accurate in otherligand-substrate models such as affinity predictions between the MHCand its epitope. Among its advantages there is the possibility of make predictions over altered versions of the catalytic domain,allowing inferring over mutations in kinases.The central role of protein kinases in orchestrating the cell signaling networks renders this family of proteins of particular interest in global investigation of perturbed and diseased systems.The development of new tools will lead to a better understanding of the regulation of kinase activity and identification of its targets and their roles in a biological system providing a foundation for further development of drugs for clinical use. In future works we expect to introduce the Long Short-Term Memory (LSTM) architecture ofANNs which are able to handle inputs with variable lengths allowing the method to read the complete catalytic domain instead of align and select the positions.