IABIMO   27858
INSTITUTO DE AGROBIOTECNOLOGIA Y BIOLOGIA MOLECULAR
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
A new feature selection approach for genomic prediction methods
Autor/es:
VILLALBA, PV; ACUÑA, CV; MARCÓ, M; AGUIRRE, NC; MARTINEZ, MC; GARCIA, MN; OBERSCHELP, J; RIVAS, JG; MARCUCCI POLTRI, SN; HARRAND, L; HOPP, HE
Lugar:
Virtual Congress
Reunión:
Congreso; First Latin American Congress of Women in Bioinformatics and Data Science (Virtual Edition 2020); 2020
Resumen:
Genomic selection (GS) is based on the simultaneous estimation of the effects of all available markers along the genome for predicting individual breeding values. In GS genetic markers covering the whole genome are used so that all quantitative trait loci (QTL) are in linkage disequilibrium with at least one marker. However, it is reasonable to assume that not all markers contribute to the trait of interest and that the elimination of those irrelevant and redundant markers will give more accurate models.On the other hand, the reduction of the dimensionality allows to only keep markers which are linked to those QTLs directly or indirectly involved with the performance of the trait as well as the effect of dominance or epistasis. In addition, more compact models have greater generalization ability.We propose a new feature selection approach based on the meta-analysis of the effects of intrinsically different GS methodologies: linear regression (Ridge regression BLUP), Bayesian linear regression (Bayes LASSO) and non-parametric methodology (Random Forest). We evaluated the performance of this new feature selection method in two different plant species: Zea mays L. simulated data set that comprises 1250 doubled haploid (DH) lines fingerprinted for 1117 SNPs and one quantitative trait (Wimmer et al., 2012) and Eucalyptus grandis real dataset that comprises 131 full sib individuals, 2378 DArTs and 74 SSR, and three quantitative traits (García 2013). Compared to standard GS methodologies this approach performed better in terms of accuracy (Pearson correlation between predicted and observed phenotypes). These higher accuracies were more evident in low heritability traits, being this issue very important in characteristics that are difficult or expensive to measure. Therefore, this strategy could improve GS in plants.