INVESTIGADORES
FERNANDEZ Elmer Andres
congresos y reuniones científicas
Título:
Algorithms for population structure inference of molecular marker profiles
Autor/es:
ANDRE PEÑA MALAVERA; ELMER A FERNÁNDEZ; BALZARINI, MÓNICA
Lugar:
Cordoba
Reunión:
Congreso; 2do Congreso Argentino de Bioinformática y Biología Computacional; 2011
Institución organizadora:
A2B2C- Universidad Catolica de Cordoba
Resumen:
Background In genetic studies is of interest to identify the underlying genetic structure of a set of individuals. When there are subgroups of individuals who differ systematically in allele frequencies of markers, it creates a genetic structure, not to be considered, increases the risk of detecting spurious associations between markers and the phenotype of interest. Several statistical and bioinformatics algorithms are used to determine the grouping of individuals from marker data. Among these are those based on hierarchical clustering algorithms (UPGMA and Ward) cluster non- hierarchical K-means [1], neural networks and self-organizing maps (SOM) [2] method based on Markov´s chains (STRUCTURE ) [3] and pre-eigenanalysis via Ward (ea-ward)hierarchical clustering using Euclidean distance based on principal components (PC), statistically significant [4]. Methods We compared these algorithms in their ability to correctly classify the molecular profiles, in the populations they belong because of the simulation. We report the classification error. The comparison is done under different scenarios defined by: Number of populations and level of separability. Four scenarios were simulated with the program QMSim two with 3 and 5 populations with a level of separability generated after 100 generations from the founding population (level 1) and two with a lower separability induced by only 10 generations of breeding (level 2). Results In the situation of strong population structure, ie groups of very different molecular profiles (Separability level 1), all algorithms, except that based on eigenanalysis, profiles classified correctly in 99% of cases. When working only with significant CP, the cluster produced by 22% and 17% classification error for 3 and 5 populations, respectively. At level 2 the separability algorithm based on eigenanalysis, identified most significant components for classification. Despite all the algorithms had a classification error of 60% representing the average classification error of the algorithms. The algorithm with less error for severability level 2 was based on both Markov chains for 3 to 5 populations.Conclusion The algorithms performed similarly when the level of separability is lower, the visualization of clustering is facilitated by the Bayesian algorithm because it shows the percentage for each individual belonging to each group. The computational time of algorithms in this study indicated that the SOM algorithm was the most efficient. A fair comparison among software is not easy, as the programs have subtly different purpose and outputs. A barplot displays the probability for each individual within each population. References