IDT. Quality oligos. Every time.

Originally published as Genetics Published Articles Ahead of Print on December 15, 2008.

Genetics, Vol. 181, 701-710, February 2009, Copyright © 2009
doi:10.1534/genetics.108.094060

Correcting Estimators of {theta} and Tajima's D for Ascertainment Biases Caused by the Single-Nucleotide Polymorphism Discovery Process

* Departament de Ciències de la Salut i de la Vida, Universitat Pompeu Fabra, 08003 Barcelona, Catalonia, Spain, {dagger} Departments of Integrative Biology and Statistics, University of California, Berkeley, California 94720-3140 and {ddagger} Department of Biology, University of Copenhagen, 2100 Kbh Ø, Copenhagen, Denmark

1 Corresponding author: Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Doctor Aiguader 88, 08003 Barcelona, Spain.
E-mail: anna.ramirez{at}upf.edu

Most single-nucleotide polymorphism (SNP) data suffer from an ascertainment bias caused by the process of SNP discovery followed by SNP genotyping. The final genotyped data are biased toward an excess of common alleles compared to directly sequenced data, making standard genetic methods of analysis inapplicable to this type of data. We here derive corrected estimators of the fundamental population genetic parameter {theta} = 4Neµ (Ne, effective population size; µ, mutation rate) on the basis of the average number of pairwise differences and on the basis of the number of segregating sites. We also derive the variances and covariances of these estimators and provide a corrected version of Tajima's D statistic. We reanalyze a human genomewide SNP data set and find substantial differences in the results with or without ascertainment bias correction.