- THIS ARTICLE
-
Abstract
- Full Text (PDF)
-
All Versions of this Article:
genetics.104.031039v1
168/4/2373 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Nielsen, R.
- Articles by Clark, A. G.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Nielsen, R.
- Articles by Clark, A. G.
Originally published as Genetics Published Articles Ahead of Print on September 15, 2004.
Genetics, Vol. 168, 2373-2382, December 2004, Copyright © 2004
doi:10.1534/genetics.104.031039
Reconstituting the Frequency Spectrum of Ascertained Single-Nucleotide Polymorphism Data
Rasmus Nielsen*,
,1,
Melissa J. Hubisz* and
Andrew G. Clark
* Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853
Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York
Center for Bioinformatics, University of Copenhagen, 2100 Copenhagen, Denmark
1 Corresponding author: Center for Bioinformatics, Universitetsparken 15, 2100 Kbh Ø, Denmark.
E-mail: rasmus{at}binf.ku.dk
>ABSTRACT
THEORY AND METHODS
DATA ANALYSIS
DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
Most of the available SNP data have eluded valid population genetic analysis because most population genetical methods do not correctly accommodate the special discovery process used to identify SNPs. Most of the available SNP data have allele frequency distributions that are biased by the ascertainment protocol. We here show how this problem can be corrected by obtaining maximum-likelihood estimates of the true allele frequency distribution. In simple cases, the ML estimate of the true allele frequency distribution can be obtained analytically, but in other cases computational methods based on numerical optimization or the EM algorithm must be used. We illustrate the new correction method by analyzing some previously published SNP data from the SNP Consortium. Appropriate treatment of SNP ascertainment is vital to our ability to make correct inferences from the data of the International HapMap Project.
THE large-scale single-nucleotide polymorphism (SNP) genotyping projects have generated much interest in population genetic analysis of human polymorphism. SNPs may be used for the estimation of demographic parameters, such as population growth rates, admixture proportions, migration rates, and population divergence times (e.g., WAKELEY et al. 2001; CAVALLI-SFORZA and FELDMAN 2003). In addition, SNPs may be used in studies of the effect of natural selection, for example, for mapping the genomic location of selective sweeps (e.g., SUNYAEV et al. 2000; AKEY et al. 2002; SABETI et al. 2002). With the availability of thousands of typed SNPs in multiple human ethnic groups, there is some hope that many questions regarding the human genetic ancestry might soon be resolved. However, the analysis of the SNP data is complicated by the SNP discovery protocols applied in the large SNP genotyping projects. Typically, SNPs are originally identified from the genetic material of a small group of individuals, often called the discovery panel. Thereafter, the SNPs found in this small panel are typed in a larger sample, typically with an ethnic composition similar to that of the discovery panel (e.g., TAILLON-MILLER et al. 1998; WANG et al. 1998; PICOULT-NEWBERG et al. 1999; ALTSHULER et al. 2000). Basing the SNP discovery protocol on initial identification in a small panel, in contrast to direct sequencing, will bias the composition of the sample to contain more high-frequency alleles (e.g., NIELSEN 2000). Most standard population genetic tools for data analysis are, therefore, not applicable to this type of SNP data. Fortunately, it is in many cases possible to correct for the ascertainment bias (e.g., WAKELEY et al. 2001; NIELSEN and SIGNOROVITCH 2003; POLANSKI and KIMMEL 2003). For example, NIELSEN and SIGNOROVITCH (2003) showed how the HUDSON (2001) composite-likelihood estimator of the population recombination rate can be modified to provide approximately unbiased estimates.
In this article we focus on methods for estimating the true frequency spectrum from a sample of SNP data. The frequency spectrum is a reduction of the data in which all SNPs are categorized according to the sample allele frequency of the SNP. Assuming no back mutations and assuming that the ancestral state of the SNP is known, there are n 1 possible allele frequencies in a sample of n chromosomes: x = 1, x = 2, ... , x = n 1. If the ancestral state is not known, the labeling of alleles is arbitrary, and allele frequencies of type x are identical to allele frequencies of type n x. Consequently, there are only [n/2] possible folded configurations, where [n/2] is n/2 truncated to the nearest integer. Under the assumption that SNPs are independent and identically distributed (iid), all the information in the data, for example, regarding demographic parameters, is contained in the frequency spectrum. The iid assumption is valid if the SNPs are located far apart and if the evolutionary processes are identical in all regions. If parameters of the evolutionary process vary among regions, the relevant information in the data is then instead contained in the collection of frequency spectra in different regions.
The objective of this article is to show how the true frequency spectrum can be estimated from ascertained SNP data. We focus on the frequency spectrum for three reasons. First, the methods used to correct the frequency spectrum are conceptually identical to the methods used to correct estimators of any other parameters. We derive formulas for correcting the frequency spectrum that can be applied more or less directly in studies aimed at estimating other parameters. Second, in some cases the frequency spectrum in itself is of interest, for example, for identifying genomic regions with aberrant frequency spectra, possibly due to selection. Third, by correcting the frequency spectrum for the ascertainment bias, while taking into account the inflation of the variance due to the estimation procedure, other parameters, such as demographic parameters, can be estimated.
We show that in simple, but realistic cases, an analytical formula can be used to provide maximum-likelihood estimates of the true frequency spectrum. In the more general cases, fast numerical optimization algorithms can be used to estimate the true frequency spectrum. We use these new methods to analyze a previously published SNP data set from The SNP Consortium (TSC; e.g., MATISE et al. 2003).
ABSTRACT
>THEORY AND METHODS
DATA ANALYSIS
DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
![]() | (1) |
Case 1basic model:
Let us first consider the case in which all SNPs have been ascertained in an alignment of different sequences of fixed depth (d) and where this ascertainment sample is a subset of the final sample of size n. The depth is the sample size of the ascertainment sample. The ascertainment condition is that the locus was variable in the ascertainment sample. Then the probability of ascertainment given an observed allele frequency of xi is one minus the probability of sampling all d ascertainment gene copies exclusively among either the xi alleles of one type or the n xi alleles of the other type. Also Pr(Xi = xi|P) = pxi and Pr(Asci|Xi = xi) = Pr(Asci|Xi = xi, P), so
![]() |
![]() | (2) |
![]() |
Here and in the following
if j > i or j < 0. We find the maximum-likelihood estimate by solving a set of equations obtained by setting the partial derivatives of the log-likelihood function with respect to the parameters equal to zero and solving for the parameters. Because of the constraint of
we introduce a Lagrange multiplier. After verifying that a global maximum has been found, we find that the maximum-likelihood estimate of P is simply given by
![]() | (3) |
|
In some cases, the SNP selection criterion used in the SNP discovery process might include a cutoff in the SNP frequency. Such cases can easily be incorporated into the current scheme. If the cutoff is set at a value C, then the likelihood function is just modified using
![]() | (4) |
Case 2variation in d:
This case is similar to case 1, but we assume the discovery depth (d) varies among loci. Consider first the case where information regarding d in each locus has been lost, but information is available regarding the distribution of d among loci, f(d). Then the likelihood function must be modified by summing over all possible (unknown) alignment depths when calculating the ascertainment probability,
![]() | (5) |
In the case where information regarding d is available for each locus, but d varies among loci, the likelihood function is given by Equation 2, replacing d with di, where di is the value of d in SNP locus i. Numerical optimization of this likelihood function (Equation 2) is necessary, but can be done very fast and efficiently using standard algorithms.
An example is shown in Figure 2. Ten thousand independent SNPs were simulated assuming n = 20 and a mixture of ascertainment sample sizes of d = 2, 3, 5, and 10 with equal probability. Again, the simulated data have an excess of loci with alleles of intermediate frequency compared to the true distribution. Three different correction schemes are considered. First, the likelihood function based on Equation 2 is used to correct the distribution, assuming di is known for all loci. This procedure accurately recovers the true frequency spectrum. Two different correction schemes based on Equation 5 are then considered. In the first case it is assumed that the true distribution of ascertainment sample sizes is known. Using this distribution, the correct frequency spectrum is again recovered. In the second procedure, the observed distribution of ascertainment sample sizes is used in combination with Equation 5. The observed distribution is obtained by simply counting the number of typed SNPs for which d = 2, 3, ... , etc. Using this procedure leads to a small bias and a deficiency of rare alleles. The reason is that the observed distribution of ascertainment sample sizes is in itself biased, because samples in which no SNPs occurred have been eliminated.
|
So far we have assumed that the ascertainment sample consists of an alignment of different sequences. However, in more realistic cases the ascertainment sample has been obtained by sampling with replacement from a panel of chromosomes of size m. For example, National Human Genome Research Institute sponsored a SNP discovery effort in which a SNP discovery panel of m = 24 individuals was used by many groups to find SNPs. In the reduced representation shotgun scheme (ALTSHULER et al. 2000), multiple overlapping sequences were aligned for SNP discovery. In these overlaps, not all 24 individuals were represented, and some individuals were represented by more than one read. In this case, we need to distinguish between the observed depth of the alignment (Ai) in locus i and the true number of different sequences in the alignment in locus i (di). Consider, for example, the case in which the alignment depth was known for each locus. Then
![]() |
![]() | (6) |
is the number of ways we can sample, with replacement, a chromosomes among a panel of m chromosomes such that there are exactly d different chromosomes in the sample. There are
different sets of chromosomes of size d to sample and S(a, d) ways to partition the a draws into d nonempty sets, which again can be ordered in d! different ways.
How important is it to model sampling with replacement, in contrast to assuming sampling without replacement (d = a) as previously assumed? In general, the effect is not large. For example, most of the available data from TSC (http://snp.cshl.org/) have values of a < 5, but m = 20 (see DATA ANALYSIS). In such cases correction with or without replacement gives almost identical results, because E(di|Ai = a) is close to a; e.g., if m = 20, then E(di|Ai = 4) = 3.71. We also explore cases where d is not much smaller than m and for the purpose of illustration show the case of a = 5 and m = 7 in Figure 3. In this case, Pr(di = 5|Ai = 5)
0.15, and we would expect relatively large differences between sampling with and without replacement. However, the difference between correcting with and without replacement is very minor compared to the effect of not correcting for the ascertainment bias. Corrections performed without taking into account the possibility that the same sequence has been sampled more than once from the panel sequences may perform reasonably well as long as a < m. Most of the TSC data can probably be modeled reasonably well without taking sampling with replacement into account.
|
Case 3allele frequencies in the ascertainment sample unknown:
In many cases the ascertainment sample may not have been included in the final typed sample. In this case we redefine the ascertainment condition as variability in the ascertainment sample and variability in the typed sample, since invariable loci in the typed sample in most cases will be discarded (Figure 4). If the information regarding the allele frequency in the ascertainment sample has been preserved, the previous methods can easily be adapted to deal with this case. However, if the information regarding allele frequencies in the ascertainment sample is not available, this introduces quite a bit more complexity. In the following, we illustrate how case 1 can be expanded to include this type of ascertainment scheme. The basic idea is to calculate the likelihood function by summing over all the possible values of the allele frequency in the unobserved ascertainment sample. First, redefine Pr(Xi = xi, Asci|P) in an alignment of depth d as
![]() | (7) |
![]() | (8) |
|
Similarly, redefine
![]() | (9) |
![]() | (10) |
The likelihood function is now of an algebraic form where the maximum likelihood cannot easily be obtained analytically. Instead, we can develop a fast EM algorithm for maximizing the likelihood function. When d varies among loci, the EM algorithm is no longer easily applicable and other numerical optimization methods must be used. The APPENDIX describes the EM algorithm and the necessary alterations of the likelihood function when d varies among loci.
Case 4the "double-hit" ascertainment scheme:
The International HapMap project is the largest SNP genotyping project ever conceivedcurrently planned to include a minimum of 600,000 SNPs genotyped in 270 individuals. Prior to this genotyping, SNPs are selected for the study on the basis of prior knowledge that they are variable sites and of their position in the genome. Recently, the criterion that has been selected for ascertainment is the "double-hit" scheme, meaning that both allelic states were observed in two separate studies (www.hapmap.org).
Assume that we know the panel depth for both ascertainment experiments and that the discovery panel is part of the sample used to obtain the frequency spectrum (as in case 1). Further assume that the two discovery samples were drawn from the same population. Let Asc1i refer to a SNP satisfying ascertainment condition 1 [i.e., it was discovered in an alignment of sequences of depth d(1)] and Asc2i implies that the SNP was discovered in another alignment of sequences of depth d(2). Similarly to case 1, assume that the ascertainment samples are subsets of the typed sample and further assume that the intersection between these two subsets is empty. Then,
![]() | (11) |
= {(j, k)|0 < j < d(1), 0 < k < d(2)}. The maximum-likelihood estimate of P = (p2, p3, ... , pn2} is then simply given by Equation 3 using (Asc1, Asc2) as the ascertainment condition. Unknown allele frequencies in the ascertainment sample and varying ascertainment sample size can also be incorporated in this ascertainment scheme. An example of this ascertainment scheme is shown in Figure 5. Note the magnitude of the ascertainment bias under this selection scheme. In a sample of size n = 20, SNPs with allele frequencies in the range of 4/2010/20 are now the most common SNPs. Again, the maximum-likelihood correction accurately recovers the true allele frequencies; however, incorrectly assuming a single-hit correction and d = 2 does not fully recover the true frequency spectrum. Clearly, under the double-hit ascertainment scheme ascertainment corrections based on a single-hit scheme are not appropriate.
|
Hypothesis testing and confidence intervals:
The previous discussion has illustrated how the frequency spectrum can be corrected for a variety of different ascertainment schemes. However, it has not addressed the fundamental problem of how to apply estimates of the frequency spectrum for further population genetic analysis. It is important to stress that ascertainment-corrected frequency spectra cannot be directly applied in further data analysis without taking the uncertainty in the parameter estimates into account. Fortunately, it is relatively easy to obtain measures of statistical uncertainty in these models. For example, consider ascertainment schemes where the likelihood function has the same functional form as in case 1. Then the approximate variances of the estimates can be obtained using asymptotic likelihood theory. The observed Fisher information matrix IP = {Iij} for 0 < i < n 1 is given by the negative of the matrix of second derivatives of the log-likelihood function,
![]() | (12) |
![]() | (13) |
After obtaining the variance-covariance matrix, confidence intervals for the parameters can be obtained using standard methods. Likewise, approximate confidence intervals for any function that has been calculated on the basis of the frequency spectrum (e.g., an estimator of growth rates or other demographic parameters) can be obtained, if this function is differentiable. The approximate variance of the function is obtained by applying the delta method (see, e.g., CASELLA and BERGER 1990, p. 326).
An alternative would be to bootstrap the SNPs, and for each bootstrap sample estimate the reconstituted frequency spectrum. By taking into account the increased variance due to the estimation of the frequency spectrum, such a bootstrap would accurately represent the true variance in the estimates. However, it should be noted that when numerical optimization is necessary, such a bootstrap approach can be quite computationally intensive.
Hypothesis testing can be performed using similar methods. For example, we might be interested in testing if the frequency spectrum conforms to a specific model such as the standard neutral equilibrium model, i.e., KINGMAN's (1982) coalescent model. In this model
![]() | (14) |
We may now calculate a likelihood-ratio test statistic as Log(L(
)/L(Pc)), where Pc is the value of P under Kingman's coalescent. Two times this statistic is asymptotically
2n2 distributed. However, for most data sets, the observations in the categories of the high-frequency-derived alleles will be so low that the asymptotic result may not apply. In such cases, the distribution of the test statistic must be evaluated by simulations.
ABSTRACT
THEORY AND METHODS
>DATA ANALYSIS
DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
|
The estimated and observed frequency spectra for these data are shown in Figure 7. Approximate 95% confidence intervals were obtained as plus or minus two times the standard deviation. Standard deviations were approximated as the square root of the asymptotic variances obtained using Equation 13. The frequency of singletons cannot be estimated consistently using this approach, because of the assumption of variability in both the ascertainment and the typed sample. Note that the observed distribution is quite uniform compared to the distribution expected under neutrality. There is a deficiency of rare new mutants and an excess of common alleles. In contrast, the corrected frequency spectrum shows an excess of rare alleles, as is observed in much of the available human data obtained by direct sequencing (STEPHENS et al. 2001). The excess of rare alleles may most likely be caused by population growth and/or by selection against slightly deleterious mutations.
|
We tested the fit of Kingman's coalescent model to this data using the previously described likelihood-ratio test. The observed value of the test statistic was 100.3. To evaluate the distribution of this test statistic, 100 data sets of 1308 independent SNPs were simulated under Kingman's coalescent (i.e., from Equation 14), while imposing the same ascertainment conditions as observed in the real data. The simulated distribution is shown in Figure 8. In this case the data do not fit Kingman's coalescent, due to an excess of rare derived alleles.
|
ABSTRACT
THEORY AND METHODS
DATA ANALYSIS
>DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
Given the simplicity of implementation of these methods, and the growing prevalence of SNPs ascertained through a small panel, we emphasize the importance of considering and correcting ascertainment in the analysis of human SNP data. Many of the primary inferences to be drawn from SNP data about demographic history, such as allele age, rely on an accurate assessment of the frequency spectrum. Some methods of inference of association between SNPs and risk of complex diseases also rely on inference of allele frequency spectrum, and for this application we need the most accurate statistical procedures available.
The methods described in this article assume that all SNPs are independent. While this may be true for some data sets, many data sets will contain SNPs that are correlated due to linkage. We expect that the estimate of the frequency spectrum is approximately unbiased also in such cases. For example, even in the case of linkage, several of the maximum-likelihood estimators, such as the estimator derived in case 1, can be shown also to be method-of-moments estimators. However, the measures of statistical uncertainty obtained using asymptotic likelihood theory or the bootstrap are no longer valid in the presence of linkage.
Throughout this article we have also assumed that all sequences are exchangeable, i.e., that there is no population subdivision. This is a rather strong assumption given the moderate amount of population structure observed in most human SNP data. If the ascertainment sample and the typed sample have the same ethnic makeup the effects on the estimates will probably be minor. However, the methods discussed here should not be applied to data for which the ethnic makeup is radically different between ascertainment sample and typed sample. In such cases ascertainment correction methods that explicitly take population subdivision into account should be applied.
ABSTRACT
THEORY AND METHODS
DATA ANALYSIS
DISCUSSION
>APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
![]() | (A1) |
![]() | (A2) |
At the rth iteration of the algorithm the E-step consists of finding
![]() | (A3) |
![]() | (A4) |
The M-step of the algorithm can be completed by noting that Equation A3 can be optimized with respect to P using Equation 3. The algorithm then proceeds as follows:
- Set r = 0 and
, k = 2, 3, ... , n + d 2.
- Set
, k = 2, 3, ... , n + d 2.
- Set
Repeat steps 2 and 3 until convergence.
(4.)
After convergence at the rth step of the algorithm, the reconstituted frequency spectrum in a sample of size n + d is then given by pr+1j, j = 2, ... , n + d 2. The reconstituted frequency spectrum in a sample of size n is then given by
![]() | (A5) |
![]() | (A6) |
When the ascertainment sample is not contained in the observed sample and d varies among loci similarly to case 2, the EM-algorithm can no longer be applied, but standard numerical optimization algorithms must be used instead. However, this is the case relevant to data analysis of much of the available SNP data such as the data from TSC. First, redefine Pr(Xi = xi, Asci|P) in an alignment of depth di as
![]() | (A7) |
![]() | (A8) |
![]() | (A9) |
![]() | (A10) |
![]() | (A11) |
![]() | (A12) |
The likelihood function can then be optimized using standard algorithms. In this case we used a version of the BFGS algorithm (e.g., PRESS et al. 1992, pp. 425430) modified to include constraints on the parameters.
ABSTRACT
THEORY AND METHODS
DATA ANALYSIS
DISCUSSION
APPENDIX
>ACKNOWLEDGEMENTS
LITERATURE CITED
ABSTRACT
THEORY AND METHODS
DATA ANALYSIS
DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
>LITERATURE CITED
AKEY, J. M., G. ZHANG, K. ZHANG, L. JIN and M. D. SHRIVER, 2002 Interrogating a high-density SNP map for signatures of natural selection. Genome Res. 12: 18051814.
AKEY, J. M., K. ZHANG, M. XIONG and L. JIN, 2003 The effect of single nucleotide polymorphism identification strategies on estimates of linkage disequilibrium. Mol. Biol. Evol. 20: 232242.
ALTSHULER, D., V. J. POLLAR, C. R. COWLES, W. J. VAN ETTEN, J. BALDWIN et al., 2000 A SNP map of the human genome generated by reduced representation shotgun sequencing. Nature 407: 513516.[CrossRef][Medline]
CASELLA, G., and R. L. BERGER, 1990 Statistical Inference. Duxbury Press, Belmont, CA.
CAVALLI-SFORZA, L. L., and M. W. FELDMAN, 2003 The application of molecular genetic approaches to the study of human evolution. Nat. Genet. 33: 266275.
HUDSON, R. R., 2001 Two-locus sampling distributions and their application. Genetics 159: 18051817.
KINGMAN, J. F. C., 1982 The coalescent. Stoch. Proc. Appl. 13: 235248.[CrossRef]
KUHNER, M. K., P. BEERLI, J. YAMAMOTO and J. FELSENSTEIN, 2000 Usefulness of single nucleotide polymorphism data for estimating population parameters. Genetics 156: 439447.
MATISE, T. C., R. SACHIDANANDAM, A. G. CLARK, L. KRUGLYAK, E. WIJSMAN et al., 2003 A 3.9-centimorgan-resolution human single-nucleotide polymorphism linkage map and screening set. Am. J. Hum. Genet. 73: 271284.[CrossRef][Medline]
NIELSEN, R., 2000 Estimation of population parameters and recombination rates using single nucleotide polymorphisms. Genetics 154: 931942.
NIELSEN, R., and J. SIGNOROVITCH, 2003 Correcting for ascertainment biases when analyzing SNP data: applications to the estimation of linkage disequilibrium. Theor. Popul. Biol. 63: 245255.[CrossRef][Medline]
PICOULT-NEWBERG, L., T. E. IDEKER, M. G. POHL, S. L. TAYLOR, M. A. DONALDSON et al., 1999 Mining SNPs from EST databases. Genome Res. 9: 167174.
POLANSKI, A., and M. KIMMEL, 2003 New explicit expressions for relative frequencies of single-nucleotide polymorphisms with application to statistical inference on population growth. Genetics 165: 427436.
PRESS, W. H., S. A. TEUKOLSKY, W. T. VETTERLING and B. P. FLANNERY, 1992 Numerical Recipes in C. Cambridge University Press, Cambridge, UK.
SABETI, P. C., D. E. REICH, J. M. HIGGINS, H. Z. P. LEVINE, D. J. RICHTER et al., 2002 Detecting recent positive selection in the human genome from haplotype structure. Nature 419: 832837.[CrossRef][Medline]
STEPHENS, J. C., J. A. SCHNEIDER, D. A. TANGUAY, J. CHOI, T. ACHARYA et al., 2001 Haplotype variation and linkage disequilibrium in 313 human genes. Science 293: 489493.
SUNYAEV, S. R., W. C. LATHE, III, V. E. RAMENSKY and P. BORK, 2000 SNP frequencies in human genes: an excess of rare alleles and differing modes of selection. Trends Genet. 16: 335337.[CrossRef][Medline]
TAILLON-MILLER, P., Z. GU, Q. LI, L. HILLIER and P. Y. KWOK, 1998 Overlapping genomic sequences: a treasure trove of single-nucleotide polymorphisms. Genome Res. 8: 748754.
THORISSON, G. A., and L. D. STEIN, 2003 The SNP Consortium website: past, present and future. Nucleic Acids Res. 31: 124127.
WAKELEY, J., R. NIELSEN, S. N. LIU-CORDERO and K. ARDLIE, 2001 The discovery of single-nucleotide polymorphismsand inferences about human demographic history. Am. J. Hum. Genet. 69: 13321347.[CrossRef][Medline]
WANG, D. G., J. B. FAN, C. J. SIAO, A. BERNO, P. YOUNG et al., 1998 Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science 280: 10771082.
This article has been cited by other articles:
![]() |
G. Achaz Frequency Spectrum Neutrality Tests: One for All and All for One Genetics, September 1, 2009; 183(1): 249 - 258. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Nielsen, M. J. Hubisz, I. Hellmann, D. Torgerson, A. M. Andres, A. Albrechtsen, R. Gutenkunst, M. D. Adams, M. Cargill, A. Boyko, et al. Darwinian and demographic forces affecting human protein coding genes Genome Res., May 1, 2009; 19(5): 838 - 849. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. E. Lohmueller, C. D. Bustamante, and A. G. Clark Methods for Human Demographic Inference Using Haplotype Patterns From Genomewide Single-Nucleotide Polymorphism Data Genetics, May 1, 2009; 182(1): 217 - 231. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. M. Gray, J. M. Granka, C. D. Bustamante, N. B. Sutter, A. R. Boyko, L. Zhu, E. A. Ostrander, and R. K. Wayne Linkage Disequilibrium and Demographic History of Wild and Domestic Canids Genetics, April 1, 2009; 181(4): 1493 - 1505. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M. Moses and R. Durbin Inferring Selection on Amino Acid Preference in Protein Domains Mol. Biol. Evol., March 1, 2009; 26(3): 527 - 536. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Ramirez-Soriano and R. Nielsen Correcting Estimators of {theta} and Tajima's D for Ascertainment Biases Caused by the Single-Nucleotide Polymorphism Discovery Process Genetics, February 1, 2009; 181(2): 701 - 710. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. M. Muir, G. K.-S. Wong, Y. Zhang, J. Wang, M. A. M. Groenen, R. P. M. A. Crooijmans, H.-J. Megens, H. Zhang, R. Okimoto, A. Vereijken, et al. Genome-wide assessment of worldwide chicken SNP genetic diversity indicates significant absence of rare alleles in commercial breeds PNAS, November 11, 2008; 105(45): 17312 - 17317. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Foll, M. A. Beaumont, and O. Gaggiotti An Approximate Bayesian Computation Approach to Overcome Biases That Arise When Using Amplified Fragment Length Polymorphism Markers to Study Population Structure Genetics, June 1, 2008; 179(2): 927 - 939. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. D. Hernandez, S. H. Williamson, L. Zhu, and C. D. Bustamante Context-Dependent Mutation Rates May Cause Spurious Signatures of a Fixation Bias Favoring Higher GC-Content in Humans Mol. Biol. Evol., October 1, 2007; 24(10): 2196 - 2202. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. D. Hernandez, S. H. Williamson, and C. D. Bustamante Context Dependence, Ancestral Misidentification, and Spurious Signatures of Natural Selection Mol. Biol. Evol., August 1, 2007; 24(8): 1792 - 1800. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. B. Rosenblum and J. Novembre Ascertainment Bias in Spatially Structured Populations: A Case Study in the Eastern Fence Lizard J. Hered., July 4, 2007; (2007) esm031v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Tenesa, P. Navarro, B. J. Hayes, D. L. Duffy, G. M. Clarke, M. E. Goddard, and P. M. Visscher Recent human effective population size estimated from linkage disequilibrium Genome Res., April 1, 2007; 17(4): 520 - 526. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. R. Thornton and J. D. Jensen Controlling the False-Positive Rate in Multilocus Genome Scans for Selection Genetics, February 1, 2007; 175(2): 737 - 750. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y.-K. Yoo, X. Ke, S. Hong, H.-Y. Jang, K. Park, S. Kim, T. Ahn, Y.-D. Lee, O. Song, N.-Y. Rho, et al. Fine-Scale Map of Encyclopedia of DNA Elements Regions in the Korean Population Genetics, September 1, 2006; 174(1): 491 - 497. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Kelley, J. Madeoy, J. C. Calhoun, W. Swanson, and J. M. Akey Genomic signatures of positive selection in humans and the limits of outlier approaches Genome Res., August 1, 2006; 16(8): 980 - 989. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. G. Clark, M. J. Hubisz, C. D. Bustamante, S. H. Williamson, and R. Nielsen Ascertainment bias in studies of human genome-wide polymorphism Genome Res., November 1, 2005; 15(11): 1496 - 1502. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. S. Carlson, D. J. Thomas, M. A. Eberle, J. E. Swanson, R. J. Livingston, M. J. Rieder, and D. A. Nickerson Genomic regions exhibiting positive selection identified from dense genotype data Genome Res., November 1, 2005; 15(11): 1553 - 1565. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Nielsen, S. Williamson, Y. Kim, M. J. Hubisz, A. G. Clark, and C. Bustamante Genomic scans for selective sweeps using SNP data Genome Res., November 1, 2005; 15(11): 1566 - 1575. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. H. Williamson, R. Hernandez, A. Fledel-Alon, L. Zhu, R. Nielsen, and C. D. Bustamante Simultaneous inference of selection and population growth from patterns of variation in the human genome PNAS, May 31, 2005; 102(22): 7882 - 7887. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Unneberg, M. Stromberg, and F. Sterky SNP discovery using advanced algorithms and neural networks Bioinformatics, May 15, 2005; 21(10): 2528 - 2530. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
-
All Versions of this Article:
genetics.104.031039v1
168/4/2373 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Nielsen, R.
- Articles by Clark, A. G.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Nielsen, R.
- Articles by Clark, A. G.











































