CONICET | Buscador de Institutos y Recursos Humanos

Variability analysis to determine population genetic structure (PGS) in germplasm collections is a crucial step in the formation of core collections for the conservation and use of genetics resources, as well as for association studies. From the algorithmic point of view, statistical and bioinformatics methods can be used to infer genetic structure. In this work, we used simulated and real data to compare 6 alternative procedures to estimate genetic structure. In different scenarios, characterized by levels of genetic divergence (Fst) and numbers of populations (3 and 5) were compared hierarchical clustering (UPGMA and Ward), Ward method using significant principal components according to the Tracy-Widom statistic (PCA+Ward), K-means clustering, the relative position self organizing maps (RP-Q-SOM) and a Bayesian method. The methods were comparing using the proportion clustering error (PCE). The results showed that the RP-Q-SOM algorithm, the Bayesian method implemented in the software STRUCTURE and the non hierarchical K-means clustering method are the procedures that perform best (PCE<0.52) when the level of divergence is low (Fst<0.10). The other algorithms performed poorly (PCE>0.55). The classification error increases with the number of populations, even with similar differentiation levels. Though the Bayesian algorithm is one of the mostly used methods to infer genetic structure, our results suggest that the method RP-Q-SOM performs better (PCE=0.30) in low differentiation scenarios. The non hierarchical clustering method is also competitive (PCE=0.51%). In the PCA+Ward method when Fst increases the number of significant components decreases. Additionally we processed a maize molecular marker public data set involving 627 lines. These lines were previously clustered into eight groups by using STRUCTURE and 511 SNPs. RP-Q-SOM found three genetic groups vs eight groups found by structure.

enviar mensaje