BECAS
ILCIC Andres Alejandro
congresos y reuniones científicas
Título:
Has the protein folding problem been solved with artificial intelligence? An epistemological survey of novel simulation techniques in structural bioinformatics
Autor/es:
POLZELLA, MARÍA SILVIA; LODEYRO, PENÉLOPE; ILCIC, ANDRÉS A.
Lugar:
Buenos Aires
Reunión:
Congreso; 17th International Congress on Logic, Methodology and Philosophy of Science and Technology (CLMPST); 2023
Institución organizadora:
Division of Logic, Methodology and Philosophy of Science and Technology
Resumen:
Proteins perform many essential functions for life. The classical dogma states that structuredetermines function; thus, knowing the structure should enable us to determine its function.Additionally, the thermodynamic hypothesis of protein folding states that the native secondary and tertiary structure are implicit in the amino acid sequence itself. Therefore, the native structure of a protein could, in principle, be determined from its amino acid sequence (Anfinsen et al., 1961).Research into the three-dimensional structure of proteins has been conducted for over 50years. Initially, X-ray crystallography was used to address this problem, followed by the addition of cryo-electron microscopy (cryo-EM), nuclear magnetic resonance spectroscopy (NMR), and more recently, cryo-electron tomography (cryo-ET). These techniques, however, require a great deal of time and resources.As available computational power has increased, various numerical simulation methods havebeen developed to address this issue. However, these methods have yet to reach an acceptable level of accuracy and efficiency.After the completion of the Human Genome Project, protein folding gained renewed interest.By 2020, the structures of around 170.000 proteins, nucleic acids, and complex assemblies had been determined (RCSB Protein Data Bank, n.d.), though this is only a small fraction of the billions of known protein sequences.In recent years, the open availability of high-quality, very large datasets, such as the ProteinData Bank (PDB), combined with the increased computational power obtained by new architectures (e.g. GPUs, parallel computing) and advances in machine learning methods, have changed the way scientific computing is done in many fields. Molecular biology is no exception, as exemplified by the recent success at CASP14 (Critical Assessment of Methods of Protein Structure Prediction Round 14) in 2020 of software based on machine learning techniques, such as DeepMind’s AlphaFold (Jumper et al. 2021; Tunyasuvunakool et al 2021).In this talk, we identify and discuss some epistemological issues arising from recent AIapproaches that claim to have cracked the protein folding problem and to extrapolate their protein folded models beyond the theoretical knowledge, experimental evidence, and computer simulation benchmarks available. This is a particularly sensitive topic in drug design, as some proteins can fold into several possible conformations with different effects or functions, some of which can endanger life, such as prions. We find some aspects of the methods employed that arouse reservations andshould be taken into account when making knowledge claims obtained by such means.ReferencesAnfinsen, C. B., Haber, E., Sela, M., & White, F. H. (1961). The kinetics of formation of nativeribonuclease during oxidation of the reduced polypeptide chain. Proceedings of the NationalAcademy of Sciences 47(9), 1309–1314.Jumper, J., Evans, R., Pritzel, A. et al. (2021). Highly accurate protein structure prediction withAlphaFold. Nature 596, 583–589RCSB Protein Data Bank. (n.d.). PDB Statistics: Overall Growth of Released Structures Per Year.https://www.rcsb.org/stats/growth/growth-released-structuresTunyasuvunakool, K., Adler, J., Wu, Z. et al. (2021). Highly accurate protein structure prediction for thehuman proteome. Nature 596, 590–596.