- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Data Supplement
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Marth, G. T.
- Articles by Sherry, S. T.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Marth, G. T.
- Articles by Sherry, S. T.
The Allele Frequency Spectrum in Genome-Wide Human Variation Data Reveals Signals of Differential Demographic History in Three Large World Populations
Gabor T. Martha, Eva Czabarkaa, Janos Murvaia, and Stephen T. Sherryaa National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894
Corresponding author: Gabor T. Marth, Boston College, 140 Commonwealth Ave., Chestnut Hill, MA 02467., marth{at}bc.edu (E-mail)
Communicating editor: L. EXCOFFIER
| ABSTRACT |
|---|
We have studied a genome-wide set of single-nucleotide polymorphism (SNP) allele frequency measures for African-American, East Asian, and European-American samples. For this analysis we derived a simple, closed mathematical formulation for the spectrum of expected allele frequencies when the sampled populations have experienced nonstationary demographic histories. The direct calculation generates the spectrum orders of magnitude faster than coalescent simulations do and allows us to generate spectra for a large number of alternative histories on a multidimensional parameter grid. Model-fitting experiments using this grid reveal significant population-specific differences among the demographic histories that best describe the observed allele frequency spectra. European and Asian spectra show a bottleneck-shaped history: a reduction of effective population size in the past followed by a recent phase of size recovery. In contrast, the African-American spectrum shows a history of moderate but uninterrupted population expansion. These differences are expected to have profound consequences for the design of medical association studies. The analytical methods developed for this study, i.e., a closed mathematical formulation for the allele frequency spectrum, correcting the ascertainment bias introduced by shallow SNP sampling, and dealing with variable sample sizes provide a general framework for the analysis of public variation data.
THE analysis of statistical distributions of genetic variations has a rich history in classical population genetic studies (![]()
![]()
![]()
![]()
| Modeling the distribution of allele frequency |
|---|
Prior study of the AFS has been restricted to properties of summary statistics such as Tajima's D (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
| Demographic history |
|---|
The reconstruction of human demographic history is of direct biological and anthropological interest. Additionally, the history of effective population size has a profound effect on important quantities such as the extent of linkage disequilibrium and is therefore important for medical association studies. There have been many attempts for demographic inference from contemporary molecular data representing different molecular mutation systems such as mitochondrial DNA polymorphisms (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
|
| METHODS |
|---|
Allele frequency spectrum under stepwise constant effective population size:
We show that, for a population evolving under the Wright-Fisher model, and under selective neutrality, the expectation for the number of mutations
i of size i, within a sample of n chromosomes under a demographic history of multi-epoch, piecewise constant effective population size is
![]() |
(1) |
where µ is the (constant) per-locus mutation rate, Nm is the effective population size in epoch m, Tm is the corresponding epoch duration, and
the normalized epoch boundary time. A detailed derivation of this result is given in the Appendix. The normalized distribution of these expectations according to the frequency is the allele frequency spectrum:
![]() |
(2) |
It is sometimes useful to consider the "full" allele frequency spectrum, Pfulln(i), considering sizes 0 and n, i.e., when all samples carry the ancestral or the derived allele, respectively. We have verified the accuracy of the complete allele frequency spectrum derived from this formulation by coalescent simulations (supplemental Figure S1 at http://www.genetics.org/supplemental/). Three important properties of the allele frequency spectrum are clear from Equation 1. First, the expectation for a given frequency is linear under simultaneous scaling of all effective population sizes and epoch durations (i.e., as long as Tm and Nm are multiplied by the same constant for each m), hence the relative frequency spectrum remains unchanged. This fact can be exploited to reduce the number of parameters that characterizes a given demographic model under consideration. Second, the expected number of mutations of a given size for more than one nucleotide site is simply the sum of the individual expectations, without regard to any possible correlation among the site genealogy of proximal sites. Therefore, our results for the expected number of segregating sites as well as the allele frequency spectrum are also valid for polymorphisms at a single locus of arbitrary sequence length, without regard to possible recombination within the locus, or for polymorphisms collected from throughout the genome. This latter consideration allows us to apply the theoretical expectations derived here for the data set examined, without regard to the amount and structure of linkage between the sites represented within the set. Third, the allele frequency spectrum is independent of the actual value of the per-nucleotide, per-generation mutation rate, as long as this rate is uniform for every site considered.
Minor allele frequency spectrum (folded spectrum):
In situations where allele frequency is determined experimentally by counting the two alternative alleles within a sample of n chromosomes, it is uncertain which of the two alleles is the mutant allele. In such situations, instead of the true frequency, we work with the frequency of the less frequent (or minor) allele (![]()
![]() |
(3) |
By this definition, if n is even,
n(n/2) = 2Pn(n/2), i.e., twice the value we would expect to measure, leading to a "doubling effect." This fact needs to be taken into account during the interpretation of measured data. Because in many data sets available for analysis the ancestral allelic state is currently unknown, the folded spectrum is important in practice.
Numerical calculation of the allele frequency spectrum:
Frequency spectrum calculations were implemented in the C programming language. Some care must be taken when calculating the expected spectrum, because computing Equation 1 requires the evaluation of alternating sums, a source of numeric instability when the individual terms are close in value. Instability can be avoided by accurate calculation of each term. The higher the sample size, the more accurately each term has to be evaluated. We do not have a systematic way to predict the accuracy requirement as a function of sample size, hence we determined the accuracy requirement for a given sample size by trial and error. In our implementation, we have used high-accuracy numeric libraries with settable numeric precision. Our experience has been that, up to a sample size n = 100, a numeric precision of 100 decimal places was sufficient for our calculations. Evaluation of the allele frequency spectrum for a sample size of 1000 required a numerical precision of
500 decimal places.
Correcting ascertainment bias:
To describe the situation where polymorphic sites discovered in a set of samples are genotyped in a second, independently drawn set of samples for frequency characterization we divide the two independent groups of samples into a "discovery" group consisting of k samples and a "genotyping" group consisting of n samples. The discovery process is modeled by considering only those sites within the n + k samples that are polymorphic (i.e., are of size between 1 and k - 1) within the discovery group of depth k and discarding those sites that are monomorphic in this group, as these sites would not be considered for subsequent genotyping. The conditional probability, Pn|k(i), that a site is of size i within the n genotyping samples given that it is polymorphic in the k discovery samples is:
![]() |
(4) |
It is possible that a site that appears polymorphic within the k discovery samples is monomorphic within the n genotyping samples. As a result, the conditional probabilities Pn|k(0) and Pn|k(n) are typically nonzero, and one has to renormalize after the transformation to get the AFS. It is easy to verify that Equation 4 is also valid for calculating the folded conditional spectrum
n|k(i), as defined in Equation 3, provided that both folded spectra
k(i) and
n+k(i) are available. This property makes it possible to account for the ascertainment bias when only the folded allele frequency distributions are available. For the sake of completeness, we include the conditional spectrum for the important special case, k = 2, i.e., ascertainment within a pair of chromosomes:
![]() |
(5) |
It is easy to show that under a stationary history the spectrum is a linear function of i, and the folded spectrum is constant (Fig 2A).
|
We point out that our method of ascertainment bias correction improves on an earlier method based on using the measured discrete allele frequency as an estimator for the overall allele frequency within the population (![]()
Reduction of allele frequency counts to equivalent counts at a lower sample size:
Often allele frequency data are the result of genotyping a target number, nt, of individuals at a collection of polymorphic sites. Because of genotyping failures, however, the actual number of genotypes available at different locations is smaller and often varies from site to site. At sites where an identical number, n, of successfully determined chromosomal allelic states are available we denote the distribution of allele counts by Cn(i) and the corresponding probability distribution obtained by normalizing these counts by Pn(i). Sites with different numbers of successful genotypes are not directly comparable. To enable joint analysis of allele counts observed at all sites genotyped in the experiment, we have devised a procedure that, given an observed distribution of allele frequencies among samples, produces an equivalent distribution at a lower sample size, m. This is achieved by, first, considering all possible choices of m subsamples selected from the total n available samples, in such a way that each choice is equally likely and, second, requiring that the total number of observations remains the same. Under these assumptions, the "equivalent" allele counts,
m(i), for m subsamples are
![]() |
(6) |
![]() |
(7) |
Note that this procedure does not allow one to generate a higher sample size distribution on the basis of a lower sample size distribution. Also note that, even if the higher sample size distribution was a relative allele frequency spectrum, the resulting lower sample size distribution will contain nonzero terms for size 0 and for size m. Clearly, the first case is the result of the possibility that the omission of n - m chromosomes left us with 0 mutant alleles, and the second is that only mutant alleles remained. This results in a slight reduction of the total number of relative counts as compared to the original observations. To obtain the AFS, one omits sizes 0 and m in Equation 7 and renormalizes. It is easy to verify that the equivalence reduction also works for the folded allele frequency distribution.
We point out that our reduction procedure is not equivalent to frequency binning, a procedure sometimes employed to compare allele counts available at different samples sizes. Aggregating discrete allele frequency data on the basis of a nominal allele frequency c/n, the ratio of allele counts and the sample size, results in data distortion stemming from two sources. First, for a given sample, the inherent base frequency is fn = n-1. In general, only window sizes that are integer multiples of fn will preserve the uniform appropriation of allele sizes into frequency bins. This may be impossible if multiple sample sizes are present in the data. Second, sites with identical nominal allele frequencies but different sample sizes are not equivalent; e.g., a site with a minor allele count of 1 in 3 samples is clearly not equivalent to a site with a minor allele count of 10 in 30 samples. Distortions from both sources are most pronounced at lower sample sizes. Our equivalence reduction procedure is a technique of data aggregation that is free of such distortions. This point is further illustrated in supplemental Figure S3 at http://www.genetics.org/supplemental/, where we compared the AFS resulting from simple binning of all available data for the European samples to the AFS we obtain by the equivalence data reduction procedure presented here.
Coalescent simulations and tabulation of linkage disequilibrium:
We used coalescent simulations to verify the accuracy of our allele frequency spectrum calculations (supplemental Figure S1), to tabulate measures of linkage disequilibrium, and to tabulate distributions of mutation age. To perform these simulations, we have implemented a widely used, direct coalescent algorithm (![]()
Expectations for the extent of linkage disequilibrium were generated according to a previously published method (![]()
![]() |
(8) |
where A and a denote the mutant and the ancestral alleles at the first marker location, and B and b are the alternative alleles at the second marker location. The quantities pA, pa, pB, and pb are the corresponding allele frequency measurements, and pAB is the measured frequency of the haplotype defined by the combination of allele A at the first marker position and B at the second marker position. Finally, marker age was tabulated by registering the time of occurrence for each of the mutations during the simulations.
Model fitting to observed allele frequency spectra:
The primary objective of the fitting experiments is to determine the distribution of the posterior probability of the model parameters given the observed data: P(model|data). With the help of our closed formula for the direct calculation of the AFS we were able to generate the expected AFS for a complete, high-resolution, multidimensional grid overlaid on the parameter space that we intended to explore. This direct approach yielded the likelihood distribution, P(data|model), computed at each grid point. Given that there is no sensible way to assign an "informed" prior distribution to the model parameters, the distribution of the likelihood function is equivalent to the posterior distribution and can be used in ranking competing parameters. We point out that an alternative method of achieving the same goal is to use a Markov-chain Monte Carlo (MCMC) technique to obtain the posterior distribution (![]()
![]()
Stepwise constant models of one, two, and three epochs were considered. For each model class defined by the number of epochs, a vector of parameters describing the model was considered, including the effective population size and the duration of the epoch (expressed in terms of generations). We have sampled each effective size parameter, Ni, between 1000 and 150,000 in steps of 1000 up to 30,000 and in steps of 5000 beyond 30,000, and each epoch duration parameter, Ti, between 100 and 50,000 in steps of 100 up to 10,000 and in steps of 500 beyond 10,000. Because of the scaling equivalence of the relative distribution discussed earlier, we fixed the ancestral size (the effective size of the epoch farthest in the past) parameter at 10,000, for each model class. We have generated the unbiased allele frequency spectra by direct calculation using Equation 1, for a sample size of m + 2, where m = 41 is the (common) sample size after data reduction, and k = 2 is the discovery size. We then computed the conditional spectrum using Equation 4. Finally, we folded the spectrum using the definition given in Equation 3. To quantify the degree of fit between a given model and the observations we have used the likelihood of the observed data conditioned on the model:
![]() |
(9) |
For generating the likelihood surface for the European bottleneck size vs. duration we used the
2 metric defined as
![]() |
(10) |
In the above notations, ci is the observed number of sites of size i, c is the number of total sites, pi is the predicted (relative) probability of size i, and m is the common sample size to which all observations were reduced using the equivalence data reduction procedure outlined earlier.
Comparison between models with different epoch numbers:
Models within the same structure (same epoch number) could be directly compared on the basis of any of the three goodness-of-fit metrics discussed above. Models with different numbers of epochs were compared using methods of normal hypothesis testing for nested models (![]()
) = 2 ln(P(data|model1)/P(data|model2)) is asymptotically
2 distributed, with degrees of freedom equal to the difference in the number of parameters characterizing the models (i.e., adding one extra epoch increases the number of parameters by two). The larger this quantity, the more significant the improvement that was achieved by the introduction of the extra epoch. If the quantity is small, the improvement in data fit does not warrant the introduction of the extra parameters.
| RESULTS |
|---|
Modeling allele frequency:
We considered a diploid population whose demographic history was described by a series of epochs such that the effective population size was stepwise constant within each epoch (e.g., Fig 1) and showed that the expected number of samples carrying a mutant allele can be described by a closed, easily computable mathematical formulation (see METHODS). We derived a method for incorporating the same frequency ascertainment bias into AFS models that was introduced into real data by the sampling strategies used during SNP discovery and for revealing the strategies's consequent effect on SNP population frequency (METHODS). We illustrate the effect of this bias under different values of ascertainment sample size (Fig 2A). As expected, the bias toward sample enrichment for common polymorphisms is strongest when SNPs are discovered in a pair of chromosomes, and it gradually disappears as discovery sample size increases. Under a stationary population history, the folded spectrum under ascertainment in two chromosomes is a constant function of frequency (METHODS), and deviations from a horizontal line signal a nonstationary history that is easy to detect and interpret. In Fig 2B, we contrast the ascertainment bias-corrected, minor allele frequency spectra for notable, competing scenarios of demographic history. When a population expands, an increasing number of chromosomes simultaneously incur new mutations, which results in an overabundance of rare alleles in the spectrum. Conversely, a population collapse is a rapid loss of chromosomes, and the alleles present at high frequency are more likely to be carried by surviving chromosomes than are their rare counterparts. For that reason a collapse generates an overrepresentation of common alleles. Finally, AFS under a bottleneck history (a reduction of effective size followed by a phase of recovery) carries the signature of both the phase of collapse (a valley at intermediate frequencies) and that of growth (elevated signal at low frequencies).
We report a procedure to transform allele counts at a given sample size to a lower, target sample size (METHODS). Using this equivalence sample size reduction procedure, allele count observations at all sites can be reduced to the equivalent counts at a lower, "common denominator" sample size, as illustrated in Fig 3. This procedure is useful for analyzing allele counts at sites where the number of available genotypes is variable either because a fraction of attempted genotyping experiments failed or when merging data sets in which the attempted sample sizes are different. In such cases one selects a target sample size and applies the reduction procedure to transform allele counts observed at higher sample sizes to the equivalent counts at this lower target sample size. It is then possible to fit the resulting single AFS containing the contribution of all available data instead of fitting multiple, often sparse spectra, one for each sample size present in the data.
|
Minor allele frequency spectra observed in samples representing different world populations show differential demographic histories:
The SNP Consortium (http://snp.cshl.org), an organization formed primarily for the discovery of a large set of human SNPs, has made well over 1 million polymorphic sites available in the public domain (![]()
![]()
|
To assess the signals of population history within these observed distributions, we generated allele frequency spectra as predicted under competing scenarios of population history of varying complexity: stationary history (one epoch), expansion or collapse (two epoch), and all possible shapes of three-epoch histories (METHODS). For a given set of model parameters, we generated the corresponding theoretically predicted, ascertainment bias-corrected minor allele frequency spectrum and evaluated the degree of fit between the prediction and the observations (METHODS). For each population-specific data set and for each model structure (number of epochs), we determined the best-fitting model parameters and the corresponding measures of goodness of fit. By definition of the likelihood function used for data fitting, the best-fitting model parameters are the maximum-likelihood parameter estimates for that model class (Table 1).
|
The normalized observed allele frequency distributions for each population group and the corresponding best-performing distributions within each model class are shown in Fig 4. In all three population-specific spectra, stationary history is a poor descriptor of the data, both by visual inspection and by examination of the fit values in Table 1. The best-fitting two-epoch model for all three spectra is that of expansion (Table 1). In the European (Fig 4A) and in the Asian (Fig 4B) samples the best-fitting three-epoch model is one of a bottleneck-shaped history. In the European data, the curve fit produced by the bottleneck profile is a very significant improvement over that produced by histories of expansion. In the Asian data, the improvement is still significant but to a lesser degree. The best-fitting three-epoch models in African-American data (Fig 4C) represent a two-step population increase of moderate size.
In addition to the best-fitting models, a range of parameter values produced comparably good fit to the observations. We have examined parameter sets that produced likelihood values that were at least 90% of the value obtained for the best-fitting three-epoch parameter set. Analysis of these "close to optimal" parameter values in the European data shows that both the size (N, effective number of individuals) and duration (T, generations) of the recovery phase was within a narrow range (N1 = 19,00021,000, T1 = 27003000). Parameters of the bottleneck phase were in a wider range (N2 = 10004000 and T2 = 2001300), with several alternative pairs available: longer but less severe bottlenecks or shorter, more severe bottlenecks. Given the potential interest in a possible bottleneck in the history of European populations, we further investigated the strength of the bottleneck signal by fixing the recovery size and duration parameters (N1 = 20,000, T1 = 3000) and varying the bottleneck size N2 and duration T2 in fine increments (20). For each parameter combination, we evaluated the goodness of fit to the European spectrum as measured by the
2 statistics and reported the resulting probability surface in Fig 5. The best-fitting parameter combinations (ones not rejected by the
2 test even at the 99.8% level) lie on a slightly curved line between the following pairs: effective size of 1040 during the bottleneck for 240 generations and effective size 2320 for 560 generations. The most likely model, at this resolution, is a bottleneck effective size of 1560 for 360 generations. These values and the ratio of effective population size and bottleneck duration being nearly constant in a large region are in good agreement with previous reports (![]()
|
| DISCUSSION |
|---|
Significance of the allele frequency analysis methods presented here:
Equation 1 (METHODS) provides a simple and rapid way to generate expected distributions of allele frequency under stepwise constant models of effective population size history. This procedure is orders of magnitude faster than tabulating simulation replicates, especially for large sample sizes, permitting fast generation of model spectra to explore large parameter spaces at high resolution. The method of ascertainment bias calculation we have presented permits the interpretation of allele frequency spectra measured at polymorphic sites selected from existing variation resources. Our procedure of equivalence sample size reduction enables the analysis of realistic data sets with genotyping failures. All three of the above procedures are firmly rooted within the coalescent framework. Model calculations directly correspond to experimentally observable quantities, without referencing directly unobservable quantities such as the overall population frequency of alleles. The data-fitting methodology is conceptually simple and allows direct comparison of the degree of fit between each of the three population samples examined, at each grid point (parameter combination).
Differential population histories in the three sample sets:
On the basis of the goodness of fit between models and observations (Table 1), a history of stationary population size can be confidently rejected for all three sets of samples. Introduction of even very simple dynamics into the history has dramatically improved data fit. There were large differences among the allele frequency spectra observed in the three populations (Fig 4 and Table 1). Clearly, the shapes of the European and the Asian spectra are closer to each other than either is to the shapes of the African spectra. On the basis of the three-epoch models, both the European and the Asian data are best explained by bottleneck-shaped histories, whereas the best-fitting third-order model for the African-American data is a continued expansion. The results of hierarchical model testing (METHODS) in Table 1 show that the inclusion of the third epoch did not significantly improve the fit to the African-American data. However, the bottleneck history is a dramatic improvement over the best-fitting two-epoch growth models in both the European and Asian data. Considering the range of models that produced close to optimal fit values, but using a fixed, 20-year generation time, the European bottleneck represented a 2.5- to 10-fold decline in population size, lasting 2001300 generations [426 thousand years (KY)]. This was followed by a phase of 5- to 20-fold population expansion, starting 27004300 generations (5486 KY) ago. The Asian bottleneck represented a 2- to 3-fold decline for 6001000 generations (1220 KY), followed by 5- to 8-fold growth starting 30004200 generations (6084 KY) ago. The best-fitting models for the African-American data represent uninterrupted growth of effective population size, with the expansion clearly starting earlier than is evident in our European or the Asian data.
Earlier mitochondrial and microsatellite studies report data that are predominantly consistent with expansion-type histories of effective population size. The main evidence that points to expansion is negative values of Tajima's D and an excess of low-frequency alleles. The start of such expansion is estimated between 30 and 130 KYA (![]()
![]()
![]()
![]()
![]()
![]()
![]()
0.2 for both populations) are in general agreement with these values and signify bottlenecks on the less severe end of the spectrum. Our estimates for the start of the recovery phase (5486 KYA for Europeans, 6084 KYA for Asians) are well within the range of the mitochondrial and microsatellite estimates. The fact that our best-fitting two-epoch models indicate expansion-type histories for all three populations we examined is also consistent with conclusions from mitochondrial and microsatellite data. A valuable reality check of an inferred demographic model is its implied pairwise nucleotide diversity value,
. Although our data-fitting analysis of the relative spectrum does not provide absolute estimates for
, these values can be obtained on the basis of the best-fitting models by fixing the ancestral size N3 and mutation rate µ. For each of the three populations, we use a common ancestral effective size of 10,000 and common mutation rate of 2 x 10-8 [a value that lies between recent, prominent estimates for average per-nucleotide, per-generation human mutation rate (![]()
![]()
= 7.88 x 10-4 for the European model, in good agreement with previously reported values for other genome-wide data sets (![]()
![]()
![]()
predicted by the best-fitting model for the African-American data is 10.29 x 10-4, significantly higher than that observed within the European and Asian samples, and in agreement with the general consensus that nucleotide diversity is higher in sub-Saharan samples than in non-African data (![]()
![]()
![]()
![]()
A bottleneck-shaped history was also our best-fitting three-epoch model structure for MD distributions observed in overlap fragments of public genome clone data (![]()
![]()
To understand the consequences of the differential histories that best describe the three population-specific data sets, we have partitioned the corresponding frequency spectra according to the age of the mutations (METHODS) that gave rise to the polymorphisms (Fig 4, second column). According to these tabulations, 35.9% of the European polymorphisms originated in <10,000 generations, as did a similar fraction, 34.9%, in the Asian model. In contrast, only 29.6% of the African mutation are younger than 10,000 generations. This indicates that the bottleneck events that explain the European and Asian data have eliminated a large fraction of the polymorphisms that predated these events, and a larger fraction of current polymorphisms are of a more recent origin as compared to the African data. This effect is most visible at the common end of the spectrum: only a negligible fraction of the common African SNPs are young, but an appreciable fraction of common European and Asian SNPs have originated <10,000 generations ago and have drifted to high population frequency. Finally, the third column of Fig 4 shows the average age of SNPs at given frequencies, confirming that SNPs at a higher frequency are expected to be older than SNPs at lower frequencies. Also, in each frequency class, the expected age of African SNPs is substantially higher than that of European or Asian SNPs, corroborating earlier observations noting the more ancient origins of African SNPs.
The differential demographic histories of the three populations examined also have important consequences for the extent of allelic association in the human genome, when the different populations are considered. To illustrate this point, we have carried out coalescent simulations, taking into account the individual best-fitting histories, and tabulated the average extent of linkage disequilibrium (LD) between markers separated by different values of recombination fraction (for a fixed value of per-nucleotide, per generation recombination rate, the recombination fraction translates into physical distance), as shown in Fig 6. Similar demographic histories distilled from the Asian and European samples result in similar values of LD at a given marker distance. LD is predicted to decay more rapidly (roughly twice as fast) for the best-fitting demographic history for the African-American samples, in agreement with previous reports (![]()
|
Caveats and open problems:
Clearly, our multi-epoch, stepwise models of demographic history represent simplified versions of the "true" demographic past. Nevertheless, our three-epoch models go beyond the majority of previous studies that explore even simpler models of past population dynamics such as expansion vs. collapse or are restricted to the rejection of stationary effective size on the basis of summary statistics. Consideration of the third-order dynamics in this study allowed us to reveal a phase of bottleneck in the history characterizing the European and the Asian samples, permitting reconciliation of the signals of recent population growth apparent in mitochondrial and microsatellite data with realistic, observed values of nucleotide diversity.
Although the signal of differential history is undeniable in the data, the effect is confounded by the fact that the discovery and genotyping data sets were not drawn from a single population. SNP discovery was performed in shotgun sequences from ethnically diverse libraries (with ethnic association of individual reads unknown) aligned to the public genome reference sequence (![]()
![]()
![]()
![]()
Additionally, internal population substructure can also distort the frequency spectrum (![]()
![]()
![]()
We must also acknowledge that the current shape of human variation structure is the result of a combination of neutral and nonneutral (selective) forces. The current state of the art in recognizing the effects of selection in variation data has been reviewed recently (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Conclusion:
The allele frequency spectrum is an excellent data source for modeling demographic history because of its independence of the effects of recombination and local, or sequence composition-specific variations of mutation rates and because the experimental determination of the allele frequency spectrum requires measurement of allelic states only at single-nucleotide positions, instead of sequencing of long stretches of contiguous DNA. The emergence of population-specific genotype sets on the genome scale provides sufficient data for the direct comparison of model-predicted and observed spectra with great resolution. This permits us to improve on previous conclusions drawn on the strength of summary statistics, on the basis of data from a handful of loci. Recent advances in allele frequency modeling should provide us with exciting, new tools to explore our demographic past and explain human haplotype structure. Accurate reconstruction of the history of world populations should also help us to detect and interpret differences that must be taken into account during the development of general resources for medical use such as the recently initiated human Haplotype Map Project (![]()
![]()
![]()
| ACKNOWLEDGMENTS |
|---|
The authors are indebted to Andrew Clark for useful comments on the manuscript. We also thank Ravi Sachidanandam for kindly providing earlier versions of the allele frequency data set analyzed in this study.
Manuscript received April 15, 2003; Accepted for publication September 4, 2003.
| APPENDIX |
|---|
THE EXPECTED NUMBER OF SEGREGATING SITES IN A SAMPLE DRAWN FROM A POPULATION CHARACTERIZED BY A PIECEWISE CONSTANT, MULTI-EPOCH HISTORY OF EFFECTIVE SIZE
Model:
We consider a population of a given organism evolving under the Wright-Fisher model and under selective neutrality. Let us select a specific site in the genome of the organism. Furthermore, let us randomly draw n DNA samples from this population. Without regard to recombination, the samples possess a unique tree-shaped genealogy at the selected site (the site genealogy). Such a genealogy can be described within the framework of the coalescent: starting with n samples in the present and, through a series of coalescent events (pairs of samples finding their common ancestors), this number reduces to 1, the most recent common ancestor (MRCA), or the root of the genealogy at that site (site root). At a given time, the process is said to be in state j, if at that time the current number of samples is j. This process is Markovian, in that the length of time until the next coalescent event depends only on the current state and is independent of the previous states. Due to molecular mutation processes, the nucleotide observed at the site under consideration might be different in different individuals. Let us assume that, at any given site, only two possible nucleotides are observed (diallelic variations). Accordingly, an individual carries either the allele that was present in the site root (also known as the ancestral allele) or a mutant or derived allele. Let us further assume that the mutant allele is the result of a single mutation event (infinite-sites assumption) within an ancestral sample of the site genealogy. Under this assumption, the number of samples that carry the derived allele is identical to the number of descendants of that ancestor within the site genealogy. Conversely, the derived allele is found in exactly i samples if and only if the ancestor in which the mutation occurred gave rise to i descendants. Under the further assumption of a constant-rate mutation process (![]()
![]()
![]()
Our final goal is to extend this result from constant to merely piecewise constant population size. To this end, we use a standard continuous approximation according to which the probability density function of the length of time t spent in state k within the genealogy is exponential under a constant population size, and for a diploid population,

Using this approximation, we derive the expectation for the length of time spent in state k, under piecewise constant population history of an arbitrary number of epochs. Under the assumption of a constant-rate mutation process, this allows us to compute the expectation for the number of mutations of size i, denoted by
i, observed at a single site, at sites having identical site genealogies (DNA without recombination), or at a collection of sites with completely independent site genealogies. Because the distributions are identical for every site, the result is also valid for a collection of sites.
Conventions and useful identities:
We use the convention that the value of an empty product is 1 and the value of an empty sum is 0. The probability density function of a random variable X is denoted by fX and its cumulative density function by FX. The variable X conditioned on the event Y is denoted by X|Y. Next, we briefly state three lemmas to aid further derivations. In the following we assume that the ai are different.
LEMMA 1. For every value of x, for each 1
l
n,
![]() |
(A1) |
Proof. Let

we need to show that f(x)
0. For r: l
r
n we have that

Since f(x) is of degree at most n - l and it has at least n - l + 1 different zeros, necessarily f(x)
0. Q.E.D.
LEMMA 2. For k, i: 1
k < i
n we have
![]() |
(A2) |
Proof.

ßk,k+1 = 0, and for i > k + 1

where
LEMMA 3. For s < k < i
n:
![]() |
(A3) |
Proof. From Lemma 2,
LEMMA 4.

Proof. Using Lemma 1,
Constant effective population size:
First, we consider a demographic history characterized by a single, constant population size N1. We introduce the notations aj = (j2) and a(1)j = aj/2N1. The length of time spent in state j (after which the number of samples reduces from j to j - 1) is denoted by Tj,j-1. The random variables Tj,j-1 and Ti,i-1 are independent for i
j. The density function of Tj,j-1 is fTj,j-1(t) = a(1)je-a(1)jt, according to our model assumptions. The length of time from the present, when the number of samples is n, to the instant when the number of samples reduces to s, is denoted by T{1}n,s. Clearly T{1}n,s =
nj=s+1 Tj,j-1. The probability that, at time t, the genealogy is in state s is P(T{1}n,s
t < T{1}n,s-1). Since T{1}n,l = T{1}n,l+1 + Tl+1,l, for l: 1
l < n we can use the following convolution: fT{1}n,l(t) =
t0 fT{1}n,l+1(t - x)fTl+1,l(x)dx . Using these notations, the following are true:
THEOREM 1. For s: 1
s < n:
![]() |
(A4) |
![]() |
(A5) |
![]() |
(A6) |
For s: 2
s < n:
![]() |
(A7) |
![]() |
(A8) |
For i: 1
i < n:
![]() |
(A9) |
Proof. First we show Equation A4 and Equation A5 by downward induction on s. These equations are clearly valid for s = n - 1. Assume they are valid for s:s > k. Then

For Equation A4 we need to show that

This is equivalent to

which follows from Lemma 1. Using Lemma 1 with l = s + 1 and x = 0, we get

This completes the proof of Equation A4 and Equation A5. For (A7), note that
Then

For (A6), since T{1}n,s
0,

Equation A8 can be easily obtained from fs,s-1(t). Finally, Equation A9 follows from Equation A8, by the argument presented by ![]()
Piecewise constant effective population size:
Consider a demographic history of M distinct epochs indexed by 1, 2, ... , M, where the ancestral epoch is numbered M. For epoch i, the constant effective population size is Ni, and the duration of this epoch is Ti; in particular, TM =
. We define
We introduce
the time from the present back until the end of the ith epoch (so
0 = 0 and
M =
). At a given time t, the index of the current epoch is denoted by m(t), in formula m(t) = min {k:
k
t}. In particular, m(
i) = i, and
m(t)-1 < t
m(t). We also introduce a "normalized" time t*:

The proof is based on induction on the number of epochs. To facilitate this, we consider two kinds of partial models with smaller numbers of epochs, as follows:
- The first model has a single epoch, with effective population size Ni. The random variable T{i}n,j denotes the time from the present (state n) to the beginning of state j, under the parameters of the first model.
- The second model is a truncated version of the original M-epoch model: it consists of i epochs, with parameters that are identical to the parameters of the first i epochs of the original model, except Ti =
; i.e., the ith of the original model becomes the ancestral epoch of the truncated model. The random variable T[i]n,j denotes the time from the present (state n) to reach state j, under the parameters of the second model.
Note that the two types of models coincide when i = 1. The following are true:
THEOREM 2. For s: 1
s < n:
![]() |
(A10) |
![]() |
(A11) |
![]() |
(A12) |
![]() |
(A13) |
For s: 2
s < n:
![]() |
(A14) |
![]() |
(A15) |
For i: 1
i < n:
![]() |
(A16) |
Proof: (A12) and (A14) are consequences of (A11):

We prove (A10) and (A11) by induction on the number of epochs M. The statements are true for M = 1 by Theorem 1. For M > 1 assume that the statements are true if the number of epochs is less than M. Clearly,

The right side is a union of disjoint events; therefore (using density functions of conditioned variables) we have

Clearly

and for each i > j

Therefore for t
M-1 we have
so using the induction hypothesis, for t
M-1, Equation A10, and consequently A11, hold. In particular,

If t >
M-1, i.e., m(t) = M, then (A10) and (A11) follow from Lemma 3:

We get Equation A13 in a way similar to the proof of Equation A8:

Using Lemma 4,

This gives Equation A15. Finally, using manipulations identical to those used by ![]()

where
*m =
ml=1(Tl/2Nl). This completes the proof. Q.E.D.
| LITERATURE CITED |
|---|
AKEY, J. M., G. ZHANG, K. ZHANG, L. JIN, and M. D. SHRIVER, 2002 Interrogating a high-density SNP map for signatures of natural selection. Genome Res. 12:1805-1814.
ALTSHULER, D., V. J. POLLARA, C. R. COWLES, W. J. VAN ETTEN, and J. BALDWIN et al., 2000 An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature 407:513-516.[CrossRef][Medline]
BAMSHAD, M. and S. P. WOODING, 2003 Signatures of natural selection in the human genome. Nat. Rev. Genet. 4:99-111.[CrossRef][Medline]
BRAVERMAN, J. M., R. R. HUDSON, N. L. KAPLAN, C. H. LANGLEY, and W. STEPHAN, 1995 The hitchhiking effect on the site frequency spectrum of DNA polymorphisms. Genetics 140:783-796.[Abstract]
CARDON, L. R. and G. R. ABECASIS, 2003 Using haplotype blocks to map human complex trait loci. Trends Genet. 19:135-140.[CrossRef][Medline]
CARGILL, M., D. ALTSHULER, J. IRELAND, P. SKLAR, and K. ARDLIE et al., 1999 Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat. Genet. 22:231-238.[CrossRef][Medline]
CLARK, A. G., 2003 Finding genes underlying risk of complex disease by linkage disequilibrium mapping. Curr. Opin. Genet. Dev. 13:296-302.[CrossRef][Medline]
CLARK, A. G., K. M. WEISS, D. A. NICKERSON, S. L. TAYLOR, and A. BUCHANAN et al., 1998 Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase. Am. J. Hum. Genet. 63:595-612.[CrossRef][Medline]
CROW, J. F., and M. KIMURA, 1970 An Introduction to Population Genetic Theory. Harper & Row, New York.
DI RIENZO, A. and A. C. WILSON, 1991 Branching pattern in the evolutionary tree for human mitochondrial DNA. Proc. Natl. Acad. Sci. USA 88:1597-1601.
DI RIENZO, A., P. DONNELLY, C. TOOMAJIAN, B. SISK, and A. HILL et al., 1998 Heterogeneity of microsatellite mutations within and between loci, and implications for human demographic histories. Genetics 148:1269-1284.
EWENS, W. J., 1972 The sampling theory of selectively neutral alleles. Theor. Popul. Biol. 3:87-112.[CrossRef][Medline]
FAY, J. C. and C.-I WU, 1999 A human population bottleneck can account for the discordance between patterns of mitochondrial versus nuclear DNA variation. Mol. Biol. Evol. 16:1003-1005.[Medline]
FU, Y. X., 1995 Statistical properties of segregating sites. Theor. Popul. Biol. 48:172-197.[CrossRef][Medline]
FU, Y. X. and W. H. LI, 1993 Statistical tests of neutrality of mutations. Genetics 133:693-709.[Abstract]
GABRIEL, S. B., S. F. SCHAFFNER, H. NGUYEN, J. M. MOORE, and J. ROY et al., 2002 The structure of haplotype blocks in the human genome. Science 296:2225-2229.
GONSER, R., P. DONNELLY, G. NICHOLSON, and A. DI RIENZO, 2000 Microsatellite mutations and inferences about human demography. Genetics 154:1793-1807.
GRIFFITHS, R. C. and S. TAVARE, 1994a Simulating probability distributions in the coalescent. Theor. Popul. Biol. 46:131-159.[CrossRef]
GRIFFITHS, R. C. and S. TAVARE, 1994b Sampling theory for neutral alleles in a varying environment. Philos. Trans. R. Soc. Lond. B Biol. Sci. 344:403-410.[Medline]
HARDING, R. M., S. M. FULLERTON, R. C. GRIFFITHS, J. BOND, and M. J. COX et al., 1997 Archaic African and Asian lineages in the genetic ancestry of modern humans. Am. J. Hum. Genet. 60:772-789.[Medline]
HARPENDING, H. and A. ROGERS, 2000 Genetic perspectives on human origins and differentiation. Annu. Rev. Genomics Hum. Genet. 1:361-385.[CrossRef][Medline]
HEY, J., 1997 Mitochondrial and nuclear genes present conflicting portraits of human origins. Mol. Biol. Evol. 14:166-172.[Abstract]
HEY, J. and E. HARRIS, 1999 Population bottlenecks and patterns of human polymorphism. Mol. Biol. Evol. 16:1423-1426.[Medline]
HUDSON, R. R., 1991 Gene genealogies and the coalescent process, pp. 144 in Oxford Surveys in Evolutionary Biology, edited by D. FUTUYAMA and J. ANTONOVICS. Oxford University Press, London/New York/Oxford.
INGMAN, M., H. KAESSMANN, S. PAABO, and U. GYLLENSTEN, 2000 Mitochondrial genome variation and the origin of modern humans. Nature 408:708-713.[CrossRef][Medline]
JORDE, L. B., W. S. WATKINS, and M. J. BAMSHAD, 2001 Population genomics: a bridge from evolutionary history to genetic medicine. Hum. Mol. Genet. 10:2199-2207.
KAPLAN, N. L., R. R. HUDSON, and C. H. LANGLEY, 1989 The "hitchhiking effect" revisited. Genetics 123:887-899.
KIMMEL, M., R. CHAKRABORTY, J. P. KING, M. BAMSHAD, and W. S. WATKINS et al., 1998 Signatures of population expansion in microsatellite repeat data. Genetics 148:1921-1930.
KONDRASHOV, A. S., 2003 Direct estimates of human per nucleotide mutation rates at 20 loci causing Mendelian diseases. Hum. Mutat. 21:12-27.[CrossRef][Medline]
KRUGLYAK, L., 1999 Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat. Genet. 22:139-144.[CrossRef][Medline]
KUHNER, M. K., J. YAMATO, and J. FELSENSTEIN, 1995 Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. Genetics 140:1421-1430.[Abstract]
LANDER, E. S., L. M. LINTON, B. BIRREN, C. NUSBAUM, and M. C. ZODY et al., 2001 Initial sequencing and analysis of the human genome. Nature 409:860-921.[CrossRef][Medline]
LI, W. H., 1977 Distribution of nucleotide differences between two randomly chosen cistrons in a finite population. Genetics 85:331-337.
MARTH, G., G. SCHULER, R. YEH, R. DAVENPORT, and R. AGARWALA et al., 2003 Sequence variations in the public human genome data reflect a bottlenecked population history. Proc. Natl. Acad. Sci. USA 100:376-381.
MULLIKIN, J. C., S. E. HUNT, C. G. COLE, B. J. MORTIMORE, and C. M. RICE et al., 2000 An SNP map of human chromosome 22. Nature 407:516-520.[CrossRef][Medline]
NACHMAN, M. W. and S. L. CROWELL, 2000 Estimate of the mutation rate per nucleotide in humans. Genetics 156:297-304.
OTT, J., 1991 Analysis of Human Genetic Linkage. Johns Hopkins University Press, Baltimore.
PAYSEUR, B. A., A. D. CUTTER, and M. W. NACHMAN, 2002 Searching for evidence of positive selection in the human genome using patterns of microsatellite variability. Mol. Biol. Evol. 19:1143-1153.
PLUZHNIKOV, A., A. DI RIENZO, and R. R. HUDSON, 2002 Inferences about human demography based on multilocus analyses of noncoding sequences. Genetics 161:1209-1218.
PRZEWORSKI, M., 2002 The signature of positive selection at randomly chosen loci. Genetics 160:1179-1189.
PRZEWORSKI, M., R. R. HUDSON, and A. DI RIENZO, 2000 Adjusting the focus on human variation. Trends Genet. 16:296-302.[CrossRef][Medline]
PTAK, S. E. and M. PRZEWORSKI, 2002 Evidence for population growth in humans is confounded by fine-scale population structure. Trends Genet. 18:559-563.[CrossRef][Medline]
REICH, D. E. and D. B. GOLDSTEIN, 1998 Genetic evidence for a Paleolithic human population expansion in Africa. Proc. Natl. Acad. Sci. USA 95:8119-8123.
REICH, D. E., M. CARGILL, S. BOLK, J. IRELAND, and P. C. SABETI et al., 2001 Linkage disequilibrium in the human genome. Nature 411:199-204.[CrossRef][Medline]
REICH, D. E., S. F. SCHAFFNER, M. J. DALY, G. MCVEAN, and J. C. MULLIKIN et al., 2002 Human genome sequence variation and the influence of gene history, mutation and recombination. Nat. Genet. 32:135-142.[CrossRef][Medline]
RELETHFORD, J. H. and L. B. JORDE, 1999 Genetic evidence for larger African population size during recent human evolution. Am. J. Phys. Anthropol. 108:251-260.[CrossRef][Medline]
ROGERS, A. R., 2001 Order emerging from chaos in human evolutionary genetics. Proc. Natl. Acad. Sci. USA 98:779-780.
ROGERS, A. R. and H. HARPENDING, 1992 Population growth makes waves in the distribution of pairwise genetic differences. Mol. Biol. Evol. 9:552-569.[Abstract]
RYBICKI, B. A., S. K. IYENGAR, T. HARRIS, R. LIPTAK, and R. C. ELSTON et al., 2002 The distribution of long range admixture linkage disequilibrium in an African-American population. Hum. Hered. 53:187-196.[CrossRef][Medline]
SACHIDANANDAM, R., D. WEISSMAN, S. C. SCHMIDT, J. M. KAKOL, and L. D. STEIN et al., 2001 A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409:928-933.[CrossRef][Medline]
SHERRY, S. T., A. R. ROGERS, H. HARPENDING, H. SOODYALL, and T. JENKINS et al., 1994 Mismatch distributions of mtDNA reveal recent human population expansions. Hum. Biol. 66:761-775.[Medline]
SHERRY, S. T., H. C. HARPENDING, M. A. BATZER, and M. STONEKING, 1997 Alu evolution in human populations: using the coalescent to estimate effective population size. Genetics 147:1977-1982.[Abstract]
SUNYAEV, S. R., W. C. LATHE, III, V. E. RAMENSKY, and P. BORK, 2000 SNP frequencies in human genes an excess of rare alleles and differing modes of selection. Trends Genet. 16:335-337.[CrossRef][Medline]
TAJIMA, F., 1989 Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585-595.
TAVARE, S., D. J. BALDING, R. C. GRIFFITHS, and P. DONNELLY, 1997 Inferring coalescence times from DNA sequence data. Genetics 145:505-518.[Abstract]
TISHKOFF, S. A. and S. M. WILLIAMS, 2002 Genetic analysis of African populations: human evolution and complex disease. Nat. Rev. Genet. 3:611-621.[CrossRef][Medline]
VENTER, J. C., M. D. ADAMS, E. W. MYERS, P. W. LI, and R. J. MURAL et al., 2001 The sequence of the human genome. Science 291:1304-1351.
WALL, J. D. and J. K. PRITCHARD, 2003 Haplotype blocks and linkage disequilibrium in the human genome. Nat. Rev. Genet. 4:587-597.[CrossRef][Medline]
WALL, J. D. and M. PRZEWORSKI, 2000 When did the human population size start increasing? Genetics 155:1865-1874.
WEBER, J. L., D. DAVID, J. HEIL, Y. FAN, and C. ZHAO et al., 2002 Human diallelic insertion/deletion polymorphisms. Am. J. Hum. Genet. 71:854-862.[CrossRef][Medline]
WIEHE, T., 1998 The effect of selective sweeps on the variance of the allele distribution of a linked multiallele locus: hitchhiking of microsatellites. Theor. Popul. Biol. 53:272-283.[CrossRef][Medline]
WOODING, S. and A. ROGERS, 2002 The matrix coalescent and an application to human single-nucleotide polymorphisms. Genetics 161:1641-1650.
YU, N., Z. ZHAO, Y. X. FU, N. SAMBUUGHIN, and M. RAMSAY et al., 2001 Global patterns of human DNA sequence variation in a 10-kb region on chromosome 1. Mol. Biol. Evol. 18:214-222.
ZHAO, Z., L. JIN, Y. X. FU, M. RAMSAY, and T. JENKINS et al., 2000 Worldwide DNA sequence variation in a 10-kilobase noncoding region on human chromosome 22. Proc. Natl. Acad. Sci. USA 97:11354-11358.
ZHIVOTOVSKY, L. A., L. BENNETT, A. M. BOWCOCK, and M. W. FELDMAN, 2000 Human population expansion and microsatellite variation. Mol. Biol. Evol. 17:757-767.
This article has been cited by other articles:
![]() |
Y. Kim and D. Gulisija Signatures of Recent Directional Selection Under Different Models of Population Expansion During Colonization of New Selective Environments Genetics, February 1, 2010; 184(2): 571 - 585. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Castellano, A. M. Andres, E. Bosch, M. Bayes, R. Guigo, and A. G. Clark Low Exchangeability of Selenocysteine, the 21st Amino Acid, in Vertebrate Proteins Mol. Biol. Evol., September 1, 2009; 26(9): 2031 - 2040. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Wegmann, C. Leuenberger, and L. Excoffier Efficient Approximate Bayesian Computation Coupled With Markov Chain Monte Carlo Without Likelihood Genetics, August 1, 2009; 182(4): 1207 - 1218. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Nielsen, M. J. Hubisz, I. Hellmann, D. Torgerson, A. M. Andres, A. Albrechtsen, R. Gutenkunst, M. D. Adams, M. Cargill, A. Boyko, et al. Darwinian and demographic forces affecting human protein coding genes Genome Res., May 1, 2009; 19(5): 838 - 849. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Stadler, B. Haubold, C. Merino, W. Stephan, and P. Pfaffelhuber The Impact of Sampling Schemes on the Site Frequency Spectrum in Nonequilibrium Subdivided Populations Genetics, May 1, 2009; 182(1): 205 - 216. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. E. Lohmueller, C. D. Bustamante, and A. G. Clark Methods for Human Demographic Inference Using Haplotype Patterns From Genomewide Single-Nucleotide Polymorphism Data Genetics, May 1, 2009; 182(1): 217 - 231. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. V. Kryukov, A. Shpunt, J. A. Stamatoyannopoulos, and S. R. Sunyaev Power of deep, all-exon resequencing for discovery of human trait genes PNAS, March 10, 2009; 106(10): 3871 - 3876. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Ramirez-Soriano and R. Nielsen Correcting Estimators of {theta} and Tajima's D for Ascertainment Biases Caused by the Single-Nucleotide Polymorphism Discovery Process Genetics, February 1, 2009; 181(2): 701 - 710. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Pool and R. Nielsen Inference of Historical Changes in Migration Rate From the Lengths of Migrant Tracts Genetics, February 1, 2009; 181(2): 711 - 719. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. D Atkinson, R. D Gray, and A. J Drummond Bayesian coalescent inference of major human mitochondrial DNA haplogroup expansions in Africa Proc R Soc B, January 22, 2009; 276(1655): 367 - 373. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Zivkovic and T. Wiehe Second-Order Moments of Segregating Sites Under Variable Population Size Genetics, September 1, 2008; 180(1): 341 - 357. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. L. Kim and Y. Satta Population Genetic Analysis of the N-Acylsphingosine Amidohydrolase Gene Associated With Mental Activity in Humans Genetics, March 1, 2008; 178(3): 1505 - 1515. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Z. Li, D. M. Absher, H. Tang, A. M. Southwick, A. M. Casto, S. Ramachandran, H. M. Cann, G. S. Barsh, M. Feldman, L. L. Cavalli-Sforza, et al. Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation Science, February 22, 2008; 319(5866): 1100 - 1104. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. D. Atkinson, R. D. Gray, and A. J. Drummond mtDNA Variation Predicts Population Size in Humans and Reveals a Major Southern Asian Chapter in Human Prehistory Mol. Biol. Evol., February 1, 2008; 25(2): 468 - 474. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Garrigan, S. B. Kingan, M. M. Pilkington, J. A. Wilder, M. P. Cox, H. Soodyall, B. Strassmann, G. Destro-Bisol, P. de Knijff, A. Novelletto, et al. Inferring Human Population Sizes, Divergence Times and Rates of Gene Flow From Mitochondrial, X and Y Chromosome Resequencing Data Genetics, December 1, 2007; 177(4): 2195 - 2207. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. D. Keightley and A. Eyre-Walker Joint Inference of the Distribution of Fitness Effects of Deleterious Mutations and Population Demography Based on Nucleotide Polymorphism Frequencies Genetics, December 1, 2007; 177(4): 2251 - 2261. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Pyhajarvi, M. R. Garcia-Gil, T. Knurr, M. Mikkonen, W. Wachowiak, and O. Savolainen Demographic History Has Influenced Nucleotide Diversity in European Pinus sylvestris Populations Genetics, November 1, 2007; 177(3): 1713 - 1724. [Abstract] [Full Text] [PDF] |
||||
![]() |
A.L Topf, M.T.P Gilbert, R.C Fleischer, and A.R Hoelzel Ancient human mtDNA genotypes from England reveal lost variation over the last millennium Biol Lett, October 22, 2007; 3(5): 550 - 553. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. D. Hernandez, S. H. Williamson, L. Zhu, and C. D. Bustamante Context-Dependent Mutation Rates May Cause Spurious Signatures of a Fixation Bias Favoring Higher GC-Content in Humans Mol. Biol. Evol., October 1, 2007; 24(10): 2196 - 2202. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. D. Hernandez, S. H. Williamson, and C. D. Bustamante Context Dependence, Ancestral Misidentification, and Spurious Signatures of Natural Selection Mol. Biol. Evol., August 1, 2007; 24(8): 1792 - 1800. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Asthana, W. S. Noble, G. Kryukov, C. E. Grant, S. Sunyaev, and J. A. Stamatoyannopoulos Widely distributed noncoding purifying selection in the human genome PNAS, July 24, 2007; 104(30): 12410 - 12415. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. B. Rosenblum and J. Novembre Ascertainment Bias in Spatially Structured Populations: A Case Study in the Eastern Fence Lizard J. Hered., July 4, 2007; (2007) esm031v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Gojobori, H. Tang, J. M. Akey, and C.-I Wu Adaptive evolution in humans revealed by the negative correlation between the polymorphism and fixation phases of evolution PNAS, March 6, 2007; 104(10): 3907 - 3912. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Fearnhead Perfect Simulation From Nonneutral Population Genetic Models: Variable Population Size and Population Subdivision Genetics, November 1, 2006; 174(3): 1397 - 1406. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. L.F. Johnson and M. Slatkin Inference of population genetic parameters in metagenomics: A clean look at messy data Genome Res., October 1, 2006; 16(10): 1320 - 1327. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. L. Bubb, D. Bovee, D. Buckley, E. Haugen, M. Kibukawa, M. Paddock, A. Palmieri, S. Subramanian, Y. Zhou, R. Kaul, et al. Scan of Human Genome Reveals No New Loci Under Ancient Balancing Selection Genetics, August 1, 2006; 173(4): 2165 - 2177. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Mekel-Bobrov, P. D. Evans, S. L. Gilbert, E. J. Vallender, R. R. Hudson, and B. T. Lahn Response to Comment on "Ongoing Adaptive Evolution of ASPM, a Brain Size Determinant in Homo sapiens" and "Microcephalin, a Gene Regulating Brain Size, Continues to Evolve Adaptively in Humans" Science, July 14, 2006; 313(5784): 172b - 172b. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. T. Hamblin, A. M. Casa, H. Sun, S. C. Murray, A. H. Paterson, C. F. Aquadro, and S. Kresovich Challenges of Detecting Directional Selection After a Bottleneck: Lessons From Sorghum bicolor Genetics, June 1, 2006; 173(2): 953 - 964. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. F. Voight, A. M. Adams, L. A. Frisse, Y. Qian, R. R. Hudson, and A. Di Rienzo Interrogating multiple aspects of variation in a full resequencing data set to infer human population size changes PNAS, December 20, 2005; 102(51): 18508 - 18513. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. S. Carlson, D. J. Thomas, M. A. Eberle, J. E. Swanson, R. J. Livingston, M. J. Rieder, and D. A. Nickerson Genomic regions exhibiting positive selection identified from dense genotype data Genome Res., November 1, 2005; 15(11): 1553 - 1565. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Nielsen, S. Williamson, Y. Kim, M. J. Hubisz, A. G. Clark, and C. Bustamante Genomic scans for selective sweeps using SNP data Genome Res., November 1, 2005; 15(11): 1566 - 1575. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. F. Schaffner, C. Foo, S. Gabriel, D. Reich, M. J. Daly, and D. Altshuler Calibrating a coalescent simulation of human genome sequence variation Genome Res., November 1, 2005; 15(11): 1576 - 1583. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. H. Williamson, R. Hernandez, A. Fledel-Alon, L. Zhu, R. Nielsen, and C. D. Bustamante Simultaneous inference of selection and population growth from patterns of variation in the human genome PNAS, May 31, 2005; 102(22): 7882 - 7887. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. R. Bhangale, M. J. Rieder, R. J. Livingston, and D. A. Nickerson Comprehensive identification and characterization of diallelic insertion-deletion polymorphisms in 330 human candidate genes Hum. Mol. Genet., January 1, 2005; 14(1): 59 - 69. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Stajich and M. W. Hahn Disentangling the Effects of Demography and Selection in Human History Mol. Biol. Evol., January 1, 2005; 22(1): 63 - 73. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M. Adams and R. R. Hudson Maximum-Likelihood Estimation of Demographic Parameters Using the Frequency Spectrum of Unlinked Single-Nucleotide Polymorphisms Genetics, November 1, 2004; 168(3): 1699 - 1712. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. F. Storz, B. A. Payseur, and M. W. Nachman Genome Scans of DNA Variability in Humans Reveal Evidence for Selective Sweeps Outside of Africa Mol. Biol. Evol., September 1, 2004; 21(9): 1800 - 1811. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Data Supplement
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Marth, G. T.
- Articles by Sherry, S. T.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Marth, G. T.
- Articles by Sherry, S. T.
















(METHODS).























