Originally published as Genetics Published Articles Ahead of Print on September 15, 2004.

Genetics, Vol. 168, 2373-2382, December 2004, Copyright © 2004
doi:10.1534/genetics.104.031039

Reconstituting the Frequency Spectrum of Ascertained Single-Nucleotide Polymorphism Data

* Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853
{ddagger} Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York
{dagger} Center for Bioinformatics, University of Copenhagen, 2100 Copenhagen, Denmark

1 Corresponding author: Center for Bioinformatics, Universitetsparken 15, 2100 Kbh Ø, Denmark.
E-mail: rasmus{at}binf.ku.dk

Most of the available SNP data have eluded valid population genetic analysis because most population genetical methods do not correctly accommodate the special discovery process used to identify SNPs. Most of the available SNP data have allele frequency distributions that are biased by the ascertainment protocol. We here show how this problem can be corrected by obtaining maximum-likelihood estimates of the true allele frequency distribution. In simple cases, the ML estimate of the true allele frequency distribution can be obtained analytically, but in other cases computational methods based on numerical optimization or the EM algorithm must be used. We illustrate the new correction method by analyzing some previously published SNP data from the SNP Consortium. Appropriate treatment of SNP ascertainment is vital to our ability to make correct inferences from the data of the International HapMap Project.




This article has been cited by other articles:


Home page
Genome ResHome page
R. Nielsen, M. J. Hubisz, I. Hellmann, D. Torgerson, A. M. Andres, A. Albrechtsen, R. Gutenkunst, M. D. Adams, M. Cargill, A. Boyko, et al.
Darwinian and demographic forces affecting human protein coding genes
Genome Res., May 1, 2009; 19(5): 838 - 849.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
K. E. Lohmueller, C. D. Bustamante, and A. G. Clark
Methods for Human Demographic Inference Using Haplotype Patterns From Genomewide Single-Nucleotide Polymorphism Data
Genetics, May 1, 2009; 182(1): 217 - 231.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. M. Gray, J. M. Granka, C. D. Bustamante, N. B. Sutter, A. R. Boyko, L. Zhu, E. A. Ostrander, and R. K. Wayne
Linkage Disequilibrium and Demographic History of Wild and Domestic Canids
Genetics, April 1, 2009; 181(4): 1493 - 1505.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
A. M. Moses and R. Durbin
Inferring Selection on Amino Acid Preference in Protein Domains
Mol. Biol. Evol., March 1, 2009; 26(3): 527 - 536.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
A. Ramirez-Soriano and R. Nielsen
Correcting Estimators of {theta} and Tajima's D for Ascertainment Biases Caused by the Single-Nucleotide Polymorphism Discovery Process
Genetics, February 1, 2009; 181(2): 701 - 710.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. Foll, M. A. Beaumont, and O. Gaggiotti
An Approximate Bayesian Computation Approach to Overcome Biases That Arise When Using Amplified Fragment Length Polymorphism Markers to Study Population Structure
Genetics, June 1, 2008; 179(2): 927 - 939.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
R. D. Hernandez, S. H. Williamson, L. Zhu, and C. D. Bustamante
Context-Dependent Mutation Rates May Cause Spurious Signatures of a Fixation Bias Favoring Higher GC-Content in Humans
Mol. Biol. Evol., October 1, 2007; 24(10): 2196 - 2202.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
R. D. Hernandez, S. H. Williamson, and C. D. Bustamante
Context Dependence, Ancestral Misidentification, and Spurious Signatures of Natural Selection
Mol. Biol. Evol., August 1, 2007; 24(8): 1792 - 1800.
[Abstract] [Full Text] [PDF]


Home page
J HeredHome page
E. B. Rosenblum and J. Novembre
Ascertainment Bias in Spatially Structured Populations: A Case Study in the Eastern Fence Lizard
J. Hered., July 4, 2007; (2007) esm031v1.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
A. Tenesa, P. Navarro, B. J. Hayes, D. L. Duffy, G. M. Clarke, M. E. Goddard, and P. M. Visscher
Recent human effective population size estimated from linkage disequilibrium
Genome Res., April 1, 2007; 17(4): 520 - 526.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
K. R. Thornton and J. D. Jensen
Controlling the False-Positive Rate in Multilocus Genome Scans for Selection
Genetics, February 1, 2007; 175(2): 737 - 750.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
Y.-K. Yoo, X. Ke, S. Hong, H.-Y. Jang, K. Park, S. Kim, T. Ahn, Y.-D. Lee, O. Song, N.-Y. Rho, et al.
Fine-Scale Map of Encyclopedia of DNA Elements Regions in the Korean Population
Genetics, September 1, 2006; 174(1): 491 - 497.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
J. L. Kelley, J. Madeoy, J. C. Calhoun, W. Swanson, and J. M. Akey
Genomic signatures of positive selection in humans and the limits of outlier approaches
Genome Res., August 1, 2006; 16(8): 980 - 989.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
A. G. Clark, M. J. Hubisz, C. D. Bustamante, S. H. Williamson, and R. Nielsen
Ascertainment bias in studies of human genome-wide polymorphism
Genome Res., November 1, 2005; 15(11): 1496 - 1502.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
C. S. Carlson, D. J. Thomas, M. A. Eberle, J. E. Swanson, R. J. Livingston, M. J. Rieder, and D. A. Nickerson
Genomic regions exhibiting positive selection identified from dense genotype data
Genome Res., November 1, 2005; 15(11): 1553 - 1565.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
R. Nielsen, S. Williamson, Y. Kim, M. J. Hubisz, A. G. Clark, and C. Bustamante
Genomic scans for selective sweeps using SNP data
Genome Res., November 1, 2005; 15(11): 1566 - 1575.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
S. H. Williamson, R. Hernandez, A. Fledel-Alon, L. Zhu, R. Nielsen, and C. D. Bustamante
Simultaneous inference of selection and population growth from patterns of variation in the human genome
PNAS, May 31, 2005; 102(22): 7882 - 7887.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
P. Unneberg, M. Stromberg, and F. Sterky
SNP discovery using advanced algorithms and neural networks
Bioinformatics, May 15, 2005; 21(10): 2528 - 2530.
[Abstract] [Full Text] [PDF]