RAMALLO Virginia
About the letter “Comments on the article, “Software for Y-Haplogroup Predictions, a Word of Caution”
Lugar: Berlin; Año: 2010 vol. 125 p. 905 - 906
Dear Sirs: The letter by Dr. Athey entitled “Comments on the article, “Software for Y-haplogroup Predictions, a Word of caution” discusses a paper of our authoring in the present Journal [1]. Though at first glance his opinion may seem reasonable, a careful analysis reveals a series of mistakes committed by Dr. Athey. In the mentioned letter, one of the main concerns of Dr. Athey is the amount of STRs that were analyzed. He comments that seven markers should be avoided, explaining that an increase of that amount augments the probability of assignment, stating that “With the addition of a sufficient number of markers, the prediction probability for the correct haplogroup can be “driven” past 99% in nearly all cases, and this almost always occurs by the point where 20 markers have been used”. Nonetheless, he does not make reference to any sort of validation study, so said 99% must only refer to the probability of assignment his own software provides. What is more, Dr. Athey seems oblivious that our paper examined this software’s predictive values in different cut-off points, up to 95% of probability of assignment, still obtaining inadequate predictive values on said cut-off point. Twelve of our haplotypes proposed from 100% to 99.6% probability of assignment to the R1b haplogroup in the H. Predictor, while none of those samples belong to said haplogroup. In the R haplogroup, one haplotype give 99.8% probability to an erred haplogroup (E1b1b), while fourteen give from 100% to 99.4%, assigning these samples to the R1b haplogroup, and were considered as correct predictions. Note that in these cases the predicted haplogroup follows the nomenclature of the software, and the probability was not pooled by major branches. If the seven markers of the minimal haplotype were not a proper set, Dr. Athey’s software should not provide such high probabilities of assignment in these cases, since it misleads the user. Unfortunately, Dr. Athey’s mention that 7 Y-STRs are too few occurs only in his letter, neither of the two papers of his authoring [2, 3] available at the H. Predictor’s website (, nor the software instructions (, explain that the user should employ more than 7 markers, otherwise we would not have attempted to use it. Moreover, a simple glance at recent literature where the H. Predictor is employed clearly shows that researchers do not use such high amounts of Y-STRs when they rely on the H. Predictor; for instance, Salas et al. [4] use the 7 Y-STRs we did, plus DYS 385, and Petrejcikova et al. [5]  only 12 Y-STRs.             His statement “…indeed, the seven-marker dataset apparently resulted from a study carried out over five years ago and published in 2005” shows a misunderstanding of our paper, since in it we clearly expressed that “Haplogroups were determined in a previous report” with the corresponding citation of a paper that includes only haplogroup information, without Y-STR data. Advancing to his enumeration of papers where high amounts of Y-STRs were analyzed, Dr. Athey does not realize that, for a validation study, we needed to employ an independent dataset, one that was not used by him to calibrate his software. This is quite difficult when, in the H. Predictor’s webpage, the only information about the samples used for calibration remains the 2005 paper; although it mentions several updates,  it is not explicit which haplotypes were included, from what other sources besides Y search. When Dr. Athey comments that the Q1a3a, nomenclature from Karafet et al. [6], (or QM3 in the nomenclature anterior to the cited paper) “probably occurs only in the Native American population in Argentina”, he contradicts the literature regarding said haplogroup, which is the most frequent in all Native Americans and also present in admixed populations [a few classic papers are 7, 8, 9]. Concerning his discussion about four haplotypes that he does not find trustworthy, four wrongly-assigned haplotypes do not change significantly our likelihood ratio results, nor the uncertainty coefficient calculations, given that the whole sample was 119 haplotypes. Secondly, his search on Y search ( is not enough to invalidate our SNP results, especially when one considers that this database was employed by Dr. Athey to calibrate his software [2], so any attempt to employ it to question our work is nothing more than circular reasoning. Furthermore, one of the haplotypes (H77) that he questions was assigned by his software to the same haplogroup we defined by SNP             It is a standard procedure in statistics to introduce new samples to verify a given set of classifying tests. Further validation analyses, employing samples and populations that where not employed to calibrate the H. Predictor, will allow even more precise estimates of its predictive value. Yours sincerely, Marina Muzzio, Virginia Ramallo, Josefina M.B. Motti, M. Rita Santos, Jorge S. López Camelo, Graciela Bailliet. The authors of this letter are also the authors of “Software for Y-Haplogroup predictions: a word of caution”, published on this journal.