INPA   24560
UNIDAD EJECUTORA DE INVESTIGACIONES EN PRODUCCION ANIMAL
Unidad Ejecutora - UE
artículos
Título:
Quality Control of Genotypes Using Heritability Estimates of Gene Content at the Marker
Autor/es:
NATALIA FORNERIS; ANDRÉS LEGARRA; ZULMA G. VITEZICA; SHOGO TSURUTA; IGNACIO AGUILAR; IGNACY MISZTAL; CANTET, R. J. C.
Revista:
GENETICS
Editorial:
GENETICS SOC AM
Referencias:
Lugar: Bethesda; Año: 2015 vol. 199 p. 675 - 681
ISSN:
0016-6731
Resumen:
Quality control filtering of single-nucleotidepolymorphisms (SNPs) is a key step when analyzing genomic data. Here we presenta practical method to identify low-quality SNPs, meaning markers whosegenotypes are wrongly assigned for a large proportion of individuals, byestimating the heritability of gene content at each marker, where gene contentis the number of copies of a particular reference allele in a genotype of ananimal (0, 1, or 2). If there is no mutation at the marker, gene content has anadditive heritability of 1 by construction. The method uses restricted maximumlikelihood (REML) to estimate heritability of gene content at each SNP and alsobuilds a likelihood-ratio test statistic to test for zero error variance ingenotyping. As a by-product, estimates of the allele frequencies of markers atthe base population are obtained. Using simulated data with 10% permutationerror (4% actual error) in genotyping, the method had a specificity of 0.96 (4% of correct markers arerejected) and a sensitivity of 0.99 (1% of wrong markers are accepted) ifmarkers with heritability lower than 0.975 are discarded. Checking of Mendelianerrors resulted in a lower sensitivity (0.84) for the same simulation. Theproposed method is further illustrated with a real data set with genotypes from3534 animals genotyped for 50,433 markers from the Illumina PorcineSNP60 chipand a pedigree of 6473 individuals; those markers underwent very little qualitycontrol. A total of 4099 markers with P-values lower than 0.01 were discarded based on ourmethod, with associated estimates of heritability as low as 0.12. Contrary toother techniques, our method uses all information in the populationsimultaneously, can be used in any population with markers and pedigreerecordings, and is simple to implement using standard software for REMLestimation. Scripts for its use are provided.