INVESTIGADORES
DOPAZO Hernan Javier
congresos y reuniones científicas
Título:
Functional prediction of nsSNPs in the human genome
Autor/es:
HERNÁN DOPAZO
Lugar:
Valencia, España
Reunión:
Taller; Tercer Curso de Genética Humana; 2008
Institución organizadora:
Sociedad Española de Genética. Universitat Pompeu Fabra
Resumen:
Functional prediction of nsSNPs in the human genome Hernán J. Dopazo Comparative Genomics Uniot. Bioinformatics Department Centro de Investigación Prí¬ncipe Felipe (CIPF). Valencia. Since the earlier works of J. B. S. Haldane on sickle-cell anemia, biologists recognize the power of natural selection on genetic variation and its association to human diseases. Further developments demonstrated that most of the genetic changes occurring in a population do not affect the phenotype, or more accurately, the reproductive capacity (fitness) of the genotypes carrying genetic variants (Kimura, 1983). Recently 3.1 million single nucleotide polymorphisms (SNPs) were found in the human genome (IHMC, 2007) and a major goal on biomedical research is to understand the role of the common genetic variants in susceptibility to common diseases in human populations. A world wide survey on the genetic variation in genes associated to common human diseases concluded that SNPs occur at a frequency of 1 out of 346 bp and are roughly equally divided between synonymous and nonsynonymous changes (Cargill, et al, 1999). As approximately, two-thirds of random mutations in coding sequences alter an amino acid, the fact that nsSNPs compromise one half the total SNPs, implies strong selection against amino acid altering changes. The force of selection is also evident when comparing nsSNPs causing non-conservative amino acid substitutions with those causing a conservative change. Non-conservative nsSNPs represent only 36% of all nsSNPs, whereas randomly distributed mutations would be expected to produce a higher proportion (52%) of non-conservative changes (Cargill, et al, 1999). Currently, the NCBI SNP database (dbSNP, built 127) collects 5,689,286 validated human SNPs out of which 78,845 are nonsynonymous coding SNPs (nsSNPs). That means that about only 1% of the human validated SNPs could probably affect gene function. One of the most important questions in human genetics is to deduce which of these genetic variants are functionally relevant for human health. In other words, which of these 1% of genetic variants are targets of selective or neutral evolutionary processes in the human genome. Far from the theoretical interest of this enquire, this prediction would help in the genotyping process of SNPs probably associated to disease in classical genotype-phenotype association studies in human populations. The earliest studies on the functional prediction of human nsSNPs were pioneered by Sunyaev et al. (2000, 2001), Chasman & Adams, (2001), Wang & Moult (2001), Miller & Kumar (2001), Saunders & Baker (2002) and Santibañez Koref, et al (2003). Sunyaev et al. (2000) estimated that around 70% of disease-causing mutations occur at structurally and functionally important sites with well defined properties such as less than 5% of solvent accessibility, sometimes located in beta-strands, active sites, disulphide bonds, or evolutionary conserved sites. Moreover, they found that most of the allelic variants map to the same structurally and functionally important regions of the proteins suggesting that many of them probably have negative effects on the phenotype. In a subsequent study Sunayev et al. (2001) estimated that around 20% of the human nsSNPs affect protein function and that an average human genotype carries about 2,000 of such nsSNPs. They observed that of the majority of disease-causing mutations were at low frequencies in human populations (1% - 20%), which was considered as a validation of their method. Wang & Moult (2001) found that by far the largest proportion (83%) of disease-nsSNPs affects protein stability, 5% maps on binding sites, and approximately 10% correspond to cases where their 3D structural model gives a false negative result. They predicted that 70% of nsSNPs studied in hypertension, cardiovascular, endocrinology, and neuropsychiatric diseases correspond to cases of neutral polymorphisms while the remaining 30% affect the stability of the protein. Alternatively, Chasman & Adams (2001) used a combination of statistical methods to define structural and evolutionary parameters with significant association to disease. From the knowledge of the effects of about 6,000 mutations from the Lac repressor and the T4 lysozyme protein they estimated that around 26-32% of nsSNPs have deleterious effects on human protein function. While most of these studies mainly focused on structural parameters of proteins, Miller & Kumar (2001) explicitly studied the role of the evolutionary conservation in the functional prediction of nsSNPs emphasizing the risk of using concepts like conservation profile and similarity cutoff percentage values used in the previous models. They pointed out that evolutionary data cannot be treated as independent observations for use in statistics because they share a non-random structure of dependence defined in the historical relationships of the species. That is, model approaches based on similarity could overestimate the variability of a given site if an identical residue appears in multiple species due to phylogenetic constraints. Therefore, using an explicit method of phylogenetic reconstruction on 7 human disease proteins, Miller & Kumar (2001) demonstrated that human nsSNPs mutations are over-abundant at amino acid positions most conserved throughout the long-term history of metazoans. Human polymorphic replacement mutations and silent mutations were found randomly distributed across sites with respect to the level of conservation of amino acid sites within genes. They concluded that disease-causing amino acid changes are those that are not observed among species probably because they are not accepted by natural selection in long term evolutionary time. In the same vein, an explicit statistical phylogenetic model was developed by Santibañez Koref, et al (2003). The method indicates the probability of a given mutation being pathological considering the evolutionary conservation and the variability associated to each protein. Although the method they developed was outstanding in the fields of comparative genomic and human health, the necessary calibration of the model for each human protein is a major drawback for its use in large scale analysis. Saunders & Baker (2002) evaluated the behaviour of alternative variables in the functional prediction of nsSNPs. When using a combination of evolutionary and structural variables, they concluded that the prediction is better than when - a single kind of variable is used on its own. When fewer than five to ten homologs are available, they emphasized that the prediction of deleterious mutation should include structural information, suggesting that the evolutionary data is more informative than the structural data when a high number of sequences are used for prediction. Finally, Arbiza et al, (2006) introduced, and Capriotti et al, (2007) extend, an explicit evolutionary measure of selective pressures at a codon level as a direct functional predictor of nsSNPs. Using maximum likelihood (ML) models to estimate the well established ratio (ω) between nonsynonymous to synonymous rates of evolution in mammals, they concluded that codons with ω