INVESTIGADORES
VILLALBA Pamela Victoria
congresos y reuniones científicas
Título:
Simultaneous single-marker approach indirectly accounts for cryptic population stratification in genome-wide association studies: a case study in Eucalyptus globulus
Autor/es:
CAPPA, EP; VILLALBA, PV; GARCIA, MN; ACUÑA, CV; MARCUCCI POLTRI, SN
Lugar:
Concepción
Reunión:
Conferencia; IUFRO Tree Biotechnology Conference; 2017
Institución organizadora:
IUFRO
Resumen:
Genome-wide association studies (GWAS) have become a common approach in plant breeding. However, the presence of cryptic population structure can cause spurious associations if not adjusted properly. The single-marker mixed model that fit sequentially each marker as fixed effect, have shown to handle the substructure effects problems modeling explicitly the population structure using different sophisticate techniques (e.g., Bayesian clustering algorithm -Q matrix-, and principal coordinate analysis -P matrix-). However, these methods are computationally demanding and/or rely on estimating the correct number of subpopulations [1]. Fitting all markers simultaneously as random effects implicitly accounts for any cryptic substructure, since marker effects are estimated conditional on the effect of all other markers (e.g., [2]). Nevertheless, this phenomenon has not been illustrated in forest trees. We compared six different GWAS mixed models including or not population (Q or P matrices) and/or family (kinship matrix, -K matrix-) structures for six growth and wood traits in an Eucalyptus globulus population (n = 303), genotyped with the 7,680 DArT marker array. These models fitted the marker effects sequentially as fixed, using TASSEL, and simultaneously as random by a Bayesian LASSO. The best fit model was determined for each trait based on the extended Bayesian Information Criteria (EBIC, [3]). The resulting EBIC values varied from 578.6 to 822.0 for the sequential models and from 391.7 to 804.5 for the simultaneous models. For the simultaneous approach, model comparisons determined that the mixed models without any structure (2 traits) or with only family structure (K matrix, 4 traits) had the best fit (lowest EBIC values), i.e., none of the best models included population structure (neither Q nor P matrices) for the six traits analyzed. Therefore, fitting all markers simultaneously indirectly accounts for population stratification, thus preventing spurious association results. Moreover, this simple simultaneous approach not requires an estimate of the number of underlying substructures.