- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Przeworski, M.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Przeworski, M.
Estimating the Time Since the Fixation of a Beneficial Allele
Molly Przeworskiaa Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
Corresponding author: Molly Przeworski, Deutscher Platz 6, 04103 Leipzig, Germany., przewors{at}eva.mpg.de (E-mail)
Communicating editor: N. TAKAHATA
| ABSTRACT |
|---|
The fixation of a beneficial allele in a population leaves a well-characterized signature in patterns of nucleotide variation at linked sites. This signature can be used to estimate the time since fixation from patterns of polymorphism in extant individuals. I introduce a method to assess the support in polymorphism data for a recent episode of directional positive selection and to estimate the time since fixation. I summarize the polymorphism data by three statistics that carry information about levels of diversity, the allele frequency spectrum, and the extent of allelic associations. Simulations are then used to obtain a sample from the posterior distribution of the time since fixation, conditional on the observed summaries. I test the performance of the approach on simulated data and apply it to the gene tb1 in maize. The data support the recent fixation of a favored allele, consistent with what is known about the importance of tb1 in the domestication process of maize.
PATTERNS of nucleotide variation are shaped by the evolutionary history of the genomic region and, in particular, by adaptation. Polymorphism data are therefore informative about the nature and timing of positive selection, including what proportion of loci show evidence of selective changes or when the selective pressure was exercised. In many contexts, the time since a beneficial substitution is of interest. For example, one might want to date adaptations to temperate habitats in species of Drosophila whose range was originally restricted to the tropics (![]()
![]()
Similarly, one might hope to date the genetic changes that led to the emergence of anatomically modern humans, in particular those underlying speech and language (e.g., ![]()
![]()
![]()
![]()
![]()
40150 KYA.
If many favored substitutions occurred as "selective sweeps," in which a rare allele arises and rapidly increases in frequency until fixation in the population, their timing can be estimated from patterns of polymorphism in extant individuals. Indeed, under simplifying assumptions, a favorable substitution leaves a well-characterized signature on linked neutral variability (![]()
![]()
![]()
![]()
200,000 years (![]()
![]()
![]()
![]()
The most common approach to identifying adaptive genetic changes from polymorphism data is to assume a neutral null model and then assess whether the value of some summary of the data is unexpected under this model. A poor fit is interpreted as evidence for the action of natural selection. Among these "tests of neutrality" are allele frequency spectrum-based tests such as D (![]()
![]()
![]()
![]()
A number of alternative approaches to estimating parameters of a selective sweep model exist. In particular, ![]()
![]()
![]()
To estimate T for the case of no recombination, ![]()
![]()
![]()
I introduce a method to assess the support for a recent beneficial substitution in polymorphism data from linked neutral sites and to estimate T. The approach applies to the situation where researchers are interested in a fixed difference between species or populations that they consider a candidate for a recent selective sweep. The idea is to draw a sample from the posterior distribution of T, conditional on summaries of the polymorphism data. If a recent adaptation occurred at a closely linked site, most of the posterior probability should be on recent T values. Conversely, if most of the posterior probability is on more distant times, this suggests that the selective sweep did not occur recently, or did not occur at all, in the genealogical history of the sample. This method improves on existing ones by using more summaries of the data, including a measure of linkage disequilibrium (LD) as well as a summary of the frequency spectrum and of levels of diversity. An attractive feature of the approach is that the parameters not of interest are integrated out, so one need not assume that their values are known exactly (cf. ![]()
| METHODS |
|---|
The model:
I consider the following model of a selective sweep: a neutrally evolving region is linked to a site where a favorable allele reached fixation in the population at some time, T, measured into the past. The neutral locus is assumed to evolve according to the infinite-sites model. The population mutation rate for the neutral locus is
= 4NµL, where N is the diploid effective population size of the species, µ is the mutation rate per base pair per generation, and L is the length of the locus in base pairs. Similarly, the population recombination rate for the neutral locus is
= 4NrL, where r is the rate of recombination per base pair per generation. Recombination occurs at a constant rate per base pair throughout the region and all recombination events are crossovers without gene conversion. The population recombination rate between the neutral and selected loci is C = 4NrK, where K is the physical distance (in base pairs) between the closest edge of the neutral locus and the selected site. The population is random mating and of constant size.
Time is scaled in units of 4N generations, so T = 1 is equivalent to fixation of the favored allele 4N generations ago. Selection is additive. The increase in frequency of the favored allele is modeled deterministically, from introduction to fixation [i.e., frequency 1 - 1/(2N)]. The sojourn time of the favored allele in the population is then
2 ln(2N)/s, where s is the selection coefficient of the favored allele (![]()
Simulations of selective sweep:
The model is implemented in a coalescent framework. There are two phases: a neutral phase, in which the coalescent is the standard coalescent with recombination (cf. ![]()
![]()
![]()
Summaries of the polymorphism data:
At present, it is not computationally feasible to use all the polymorphism data to estimate parameters, even for models simpler than the one considered here (![]()
![]()
![]()
![]()
Simulating a sample from the posterior distribution:
In addition to the time since the fixation of the beneficial allele, T, there are four parameters in this selective sweep model whose values cannot be measured exactly: N, s, µ, and r. (While in many population genetic contexts, the parameter N does not appear on its own, here, it specifies the frequency at which the beneficial allele first appears [i.e., 1/(2N)].) I assume that researchers are interested in a particular site where a substitution may have been selected, so K is known. To model the uncertainty in the other parameters, I use a distribution of prior values, rather than assuming a fixed value (see below).
To obtain a sample from the posterior distributions of the parameters conditional on the summaries of the data, I follow the rejection-algorithm 2 outlined in ![]()
- Pick Ni, si, µi, ri, and Ti values independently from the prior distributions given below.
- To model the history of the sample at the neutral locus, simulate a recombination graph (including coalescent events and recombination events, but no mutation) under a selective sweep model with parameter values K, Ni, si, µi, ri, and Ti.
- Because of recombination, different segments of the neutral locus will have distinct genealogies. Suppose that there are n segments, where each segment is a set of consecutive base pairs with the same genealogy; let Lj be the length of segment j and
j be the length of the genealogy for segment j. Accept the graph simulated in step 2 with probability u = Po(S, 4Nµ
)/Po(S, S) where Po denotes the probabilities of the Poisson distribution and
. If the graph is rejected, return to step 1. - Place S segregating sites on the graph.
- Tabulate two summaries of the simulated data: Di and Hi.
- If |Di - D|
and Hi = H, record values Ni, si, µi, ri, and Ti.
Run this algorithm until there are M
sets of recorded values. The list of sets of recorded values is a sample of size M
from the joint posterior distribution of the parameters, conditional on Si = S, |Di - D|
, and Hi = H. A sample from the posterior distribution of one or a subset of the parameters can be obtained by considering only the values of the parameters of interest. A sample from the posterior distribution of Tgen = 4NT, the time in generations since the fixation of the beneficial allele, is given by the appropriate product of T and N in this list.
In step 1, the recombination rate r is chosen from a gamma distribution with parameters (z x 10, 109); the mean is then z x 10-8. Assuming little error in the physical map and homogeneity of local recombination rates, the sampling error associated with large-scale estimates of genetic distance provides some sense of the accuracy of r estimates. I assume that the estimate obtained from a comparison of genetic and physical maps would be used as the mean of the gamma distribution and choose the parameters to be in rough accordance with the estimates of sampling error provided for the latest genetic map in humans (![]()
![]()
The aim is to assess the support for a beneficial substitution that happened recently in the genealogical history of the sample. The time depth of that history depends on N: as an illustration, a substitution that occurred 40,000 generations ago is recent for Drosophila melanogaster, where N is on the order of 106, but not for humans, where N is estimated to be
104. I therefore place the prior on the scaled time since the fixation of the beneficial allele, T, rather than on the true time in generations, 4NT. In the absence of a selective sweep at a nearby site, the scaled time to the most recent common ancestor of a sample is on average
1 (for a sample size greater than two; cf. ![]()
![]()
![]()
I use two different priors on T to test the robustness of the approach to the choice of distribution. In one implementation, T is exponential with parameter 1.2 [referred to as Exp(1.2)]; the Pr(T < 1) is then
0.70. This distribution arises if beneficial mutations occur infrequently and independently of one another (![]()
0.2 to be a recent selective sweep. The Pr(T
0.2) is very similar for both distributions: if T is U(0, 1), it is obviously 0.2, while if T is Exp(1.2), it is
0.21.
Test of performance:
I generate 100 simulated data sets under the selective sweep model with fixed values of the parameters Nx = 104, sx = 0.01, µx = 10-8/bp/generation, and rx = 1 cM/Mb. These parameters are chosen to be plausible descriptions of a region of average recombination in humans. The neutral locus is 104 bp in length and the sample size is 50 chromosomes. The physical distance between selected and neutral loci, K, is 103 bp. I consider three cases: T = 0, T = 0.10, and T = 100 (which is equivalent to no selective sweep in the genealogical history of the sample). For each simulated data set, I obtain a sample from the posterior distribution of T, according to the method outlined above, with
= 0.1 and M
= 2 x 103. Parameters m, z, and Y are chosen such that the mean N = Nx, the mean r = rx, and the mean µ = µx. I assume that the physical distance to the selected site is known.
Application to tb1:
As an "empirical test," I apply the method to the gene tb1 in maize, a locus known to have experienced selective pressure during the domestication process that occurred over the past 510 KYA (cf. ![]()
![]()
1 kb upstream of the 5' nontranscribed region that they sequenced; I use a value of 500 bp in my simulation. Parameterization of the prior distributions is as follows: m, z, and Y are chosen such that the mean N = 5 x 105 (![]()
![]()
= 0.1 and M
= 4 x 104. For the results reported in Fig 3, the prior distribution of T is U(0, 1); results are very similar if instead the prior on T is Exp(1.2) (results not shown).
|
|
|
Code:
The C program used to simulate a sample from the posterior distribution of the parameters is available from http://email.eva.mpg.de/~przewors. The sample size from the posterior distribution, M
, is specified by the user, as is the tolerance
. For a data set reported in Table 1, for which M
= 2000, it took anywhere from 1 min to 3 days on a 1667-Mhz AMD Athlon single processor. A second version of the program is available upon request. In this implementation, step 3 of the algorithm is eliminated. Instead, parameter values are recorded whenever |Di - D|
and Hi = H and each set of recorded values is weighted by the probability u. The algorithm is repeated until an estimate of the effective sample size (based on the mean and sample variance of the importance weights, u) exceeds a value specified by the user. This "importance sampling" procedure is known to be more efficient than the acceptance-rejection algorithm described above (cf. Appendix of ![]()
|
| RESULTS |
|---|
To assess the support for a recent selective sweep, I follow the approach developed by ![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Choice of summaries:
Short of being able to use all the information in the data, one would like to use summaries that are sensitive to the parameter of interest (here, the time since the fixation of the beneficial allele, T) and capture different facets of the data. I focus on three statistics: the number of segregating sites (S), a summary of the allele frequency spectrum (D), and a summary of linkage disequilibrium (H). Previous studies have shown two of the statistics, S and D, to be sensitive to T. Specifically, selective sweeps are expected to reduce diversity, thus reducing S, and skew the frequency spectrum toward rare alleles, leading to negative D values (![]()
![]()
![]()
![]()
![]()
, the mean pairwise difference in the sample (![]()
as well.
The number of haplotypes, H, also carries information about T. Its behavior depends on the strength of selection and the recombination rate. If recombination occurs between selected and neutral loci during the selective sweep, then at T = 0, H/(S + 1) will be lower on average than it would be in the absence of selection (![]()
![]()
![]()
In summary, while to a rough approximation, S and D are expected to increase monotonically with T, H/(S + 1) tends to have a maximum value at some intermediate T. This suggests that using H as an additional statistic may help to distinguish between recent T values and therefore to refine the estimates of T obtained using D and S alone. I illustrate this in Fig 1 by plotting the posterior distribution of T for two simulated data sets, conditional on S and D (first row); S and H (second row); and S, D, and H (last row). As can be seen, conditioning on all three summaries leads to a tighter distribution around the true value than does the use of only two; this finding is confirmed by more extensive simulations (results not shown).
The extent to which the three summaries are informative about T depends on prior knowledge about the parameters. In particular, detecting a reduction in diversity requires some knowledge of what levels of diversity are expected to be in the absence of selection. Thus, if one has accurate prior knowledge about the population mutation rate
(= 4Nµ), the decrease in the number of segregating sites and in the number of haplotypes can be highly informative about the time since the selective sweep (![]()
, most of the information about the time since the selective sweep will come from the observed value of D and the value of H given S.
An additional benefit of using distinct aspects of the data is that the approach may be less sensitive to misspecification of the prior distributions. For example, methods that estimate T on the basis of diversity levels alone are highly sensitive to the estimate of
. If
is estimated to be higher than it is in reality but no selection has occurred, levels of variation will appear reduced. On this basis, methods may spuriously suggest the recent fixation of a beneficial allele. However, if the data are generated under a neutral model with an elevated mutation rate, the values of D and H will tend to be less likely under a recent selective sweep than under neutrality. Thus, the use of all three summaries may result in less support for a recent T. To examine this, I ran 20 simulations with no selection in which the mean of the prior distribution of
was twofold larger than the value used to generate the data (parameters as in Table 1 for a uniform prior on T). For none of the simulated data sets was there strong support for a recent selective sweep (results not shown).
Performance of approach:
One concern is that the posterior probabilities may not be well estimated. In that respect, an advantage of this method over more efficient ones such as Monte Carlo Markov chain is that it provides independent samples from the posterior distribution, so one can easily assess the accuracy of estimates of the posterior probabilities. In particular, if the sample from the posterior is of size M
, the sampling error associated with Bj, the observed number of counts in interval j, is binomial (![]()
, M
). This indicates that probabilities on the order of 1/M
are poorly estimated, while those >>1/M
are fairly precisely estimated. In the simulations presented here, M
= 2000 and probabilities of interest are >>5 x 10-4.
A second question is whether data generated under a selection sweep model with realistic parameters carry much information about the parameter T. Let To be the true time since the fixation of the beneficial allele. If the data are informative, the support for recent times should be stronger in the posterior distribution than in the prior if To is 0 or 0.10, while there should be weaker support for recent times in the absence of selection. As can be seen in Table 1, this is true of almost all simulated runs, whether the prior distribution of T is Exp(1.2) or U(0, 1). To measure the proportion of data sets with strong support for a "recent" selective sweep, I tabulate the proportion of 100 simulated data sets where the posterior Pr(T
0.2) > 0.50. For To = 0, the proportion is very high. It decreases with To, but even when the beneficial substitution occurred some time ago (To = 0.10), over one-third of simulated data sets strongly support a selective sweep (Table 1). In contrast, one rarely (in <5% of the runs) finds strong support for a recent selective sweep when none has occurred. In summary, the simulated data sets appear to be informative about recent genetic adaptations. In humans, the parameters chosen for the simulations correspond loosely to 25 individuals sequenced for 10 kb in a region of average recombination, comparable to what is currently collected for studies of putatively selected loci (e.g., ![]()
![]()
Note further that these tests of performance were carried out for two quite different prior distributions for T. The results are very similar in both implementations, suggesting that the method is robust to the choice of a prior distribution. This is reassuring, as little or nothing is usually known about Tin contrast to other parameters, where there is often independent knowledge to guide the specification of the prior.
In some contexts, one is interested in an estimate of the unscaled time (in generations) since the fixation of the beneficial allele, Tgen = 4NT. As a point estimate, one might consider the mode of the sample from the posterior distribution of Tgen. In Fig 2I plot the distribution of modes (i.e., the bin with the largest number of counts) for 100 simulated data sets. When the data are generated under a no-selection model, few modes are at recent times (e.g., 6 are at Tgen
8000 generations). In contrast, when the data are generated under a recent selective sweep model, most modes are close to the true time. For example, if the true Tgen is 0, 94 of the modes are at Tgen
4000 generations. As the true Tgen increases, the precision of the estimate decreases: thus, if the true Tgen is 4000 generations, only 60 of the modes are within a factor of two of the true value.
Application to tb1:
The tb1 locus is responsible for the short branches that distinguish maize from its wild progenitor, teosinte. This trait is thought to have fixed during the domestication process, 510 KYA (cf. ![]()
![]()
![]()
0.2. Thus, consistent with what is known about the role of tb1 in the domestication of maize, polymorphism data strongly suggest the recent fixation of a beneficial allele.
I also present a sample from the joint posterior distribution of s, the selection coefficient of the favored allele, and Tgen, the time in generations since the fixation of the beneficial allele (Fig 3B). The results suggest a large selection coefficient, in accordance with evidence that the trait was under artificial selection. However, most of the support is on older than expected times from the archeological record (assuming approximately one generation per year for maize). This discrepancy may be due to chance, since few estimates of Tgen will be on the true value even under ideal conditions (see Fig 2). Alternatively, it may reflect an incorrect assumption about the location of the selected site or a salient aspect of the history of maize not captured by the demographic or selective model (see below).
| DISCUSSION |
|---|
Advantages:
This rejection-sampling method is computationally feasible yet uses enough summaries of the data to capture information about the time since the fixation of the beneficial allele. Indeed, the limited simulations in Table 1 and Fig 2 suggest that the summaries can provide strong support for a selective sweep when it occurred recently, as well as fairly accurate point estimates of the time since fixation. Further, the precision of the posterior density estimates depends on the sample size from the posterior distribution. This is under the investigator's control, at least to some extent, as the number of replicates can easily be increased.
As can be seen in Table 1, the proportion of data sets with strong support for a selective sweep decreases with the time since fixation. This is consistent with the finding that summary-statistic-based "tests of neutrality" and the composite-likelihood method of ![]()
![]()
![]()
![]()
This said, the power to detect a recent selective sweep varies between methods, depending on the approach and the choice of summaries of the data. In that light, it is worth noting that existing estimates of the power of tests of neutrality or the composite-likelihood approach tend to assume that the values of nuisance parameters are known (e.g., ![]()
![]()
![]()
![]()
![]()
![]()
Like the composite-likelihood approach, this method allows one to estimate parameters as well as to assess support for a recent selective sweep model. An advantage of this framework is that one can also quantify the uncertainty associated with these estimates. While for composite likelihood it is not obvious how to obtain confidence intervals for the point estimates (e.g., ![]()
Possible improvements:
Although this approach can provide strong support for a recent selective sweep, the limited number of summaries entails a considerable loss of information. It is therefore important to choose summary statistics judiciously. Simulations suggest that the statistics used here capture distinct aspects of the data (Fig 1; results not shown). However, it would be useful to have a more rigorous way to compare sets of statistics than visual inspection. It may also be possible to substantially increase the number of statistics by using more sophisticated implementations of rejection-sampling methods (![]()
This approach makes a number of assumptions that can fairly easily be relaxed, for example, that sequences are contiguous or that haplotypes are known. The latter is likely to be true for Drosophila or maize, but not for human autosomes. One possibility would be to consider a summary of linkage disequilibrium that can be calculated on genotypic data, such as an estimate of the population recombination rate (![]()
![]()
![]()
In addition, the method is designed for the situation where researchers have a candidate site for positive selection and wish to assess support for this hypothesis (e.g., ![]()
![]()
A further shortcoming of this method is that it makes a number of demographic assumptions that are unrealistic for humans (![]()
![]()
![]()
![]()
![]()
| ACKNOWLEDGMENTS |
|---|
Thanks go to P. Andolfatto, B. Gaut, P. Donnelly, Y. Gilad, R. Hudson, S. Ptak, M. Tenaillon, J. Wakeley, J. Wall, and L. Zhang for helpful discussions and/or comments on the manuscript.
Manuscript received January 15, 2003; Accepted for publication April 21, 2003.
| LITERATURE CITED |
|---|
ANDOLFATTO, P., 2001 Contrasting patterns of X-linked and autosomal nucleotide variation in Drosophila melanogaster and Drosophila simulans.. Mol. Biol. Evol. 18:279-290.
BACHTROG, D. and B. CHARLESWORTH, 2002 Reduced adaptation of a non-recombining neo-Y chromosome. Nature 416:323-326.[Medline]
BARTON, N. H., 1998 The effect of hitch-hiking on neutral genealogies. Genet. Res. 72:123-133.
BEAUMONT, M. A., W. ZHANG, and D. J. BALDING, 2002 Approximate Bayesian computation in population genetics. Genetics 162:2025-2035.
BRAVERMAN, J. M., R. R. HUDSON, N. L. KAPLAN, C. H. LANGLEY, and W. STEPHAN, 1995 The hitchhiking effect on the site frequency spectrum of DNA polymorphisms. Genetics 140:783-796.[Abstract]
BUSTAMANTE, C. D., R. NIELSEN, S. A. SAWYER, K. M. OLSEN, and M. D. PURUGGANAN et al., 2002 The cost of inbreeding in Arabidopsis. Nature 416:531-534.[Medline]
CARLIN, B. P., and T. A. LOUIS, 1998 Bayes and Empirical Bayes Methods for Data Analysis. Chapman & Hall, Boca Raton, FL.
CAVALLI-SFORZA, L. L., P. MENOZZI and A. PIAZZA, 1994 The History and Geography of Human Genes. Princeton University Press, Princeton, NJ.
DAVID, J. R. and P. CAPY, 1988 Genetic variation of Drosophila melanogaster natural populations. Trends Genet. 4:106-111.[Medline]
DONNELLY, P., M. NORDBORG, and P. JOYCE, 2001 Likelihoods and simulation methods for a class of nonneutral population genetics models. Genetics 159:853-867.
ENARD, W., M. PRZEWORSKI, S. E. FISHER, C. S. LAI, and V. WIEBE et al., 2002 Molecular evolution of FOXP2, a gene involved in speech and language. Nature 418:869-872.[Medline]
EYRE-WALKER, A., R. L. GAUT, H. HILTON, D. L. FELDMAN, and B. S. GAUT, 1998 Investigation of the bottleneck leading to the domestication of maize. Proc. Natl. Acad. Sci. USA 95:4441-4446.
FAY, J. C. and C.-I WU, 2000 Hitchhiking under positive Darwinian selection. Genetics 155:1405-1413.
FEARNHEAD, P. and P. DONNELLY, 2002 Approximate likelihood methods for estimating local recombination rates. J. R. Stat. Soc. Ser. B 64:657-680.
FRISSE, L., R. R. HUDSON, A. BARTOSZEWICZ, J. D. WALL, and J. DONFACK et al., 2001 Gene conversion and different population histories may explain the contrast between polymorphism and linkage disequilibrium levels. Am. J. Hum. Genet. 69:831-843.[Medline]
HAMBLIN, M. T., E. E. THOMPSON, and A. DI RIENZO, 2002 Complex signatures of natural selection at the Duffy blood group locus. Am. J. Hum. Genet. 70:369-383.[Medline]
HUDSON, R. R., 1990 Gene genealogies and the coalescent process, pp. 114 in Oxford Surveys in Evolutionary Biology, edited by D. FUTUYMA and J. ANTONOVICS. Oxford University Press, Oxford.
HUDSON, R. R., 2001 Two-locus sampling distributions and their application. Genetics 159:1805-1817.
HUDSON, R. R., K. BAILEY, D. SKARECKY, J. KWIATOWSKI, and F. J. AYALA, 1994 Evidence for positive selection in the superoxide dismutase (Sod) region of Drosophila melanogaster.. Genetics 136:1329-1340.[Abstract]
JEFFREYS, A. J., L. KAUPPI, and R. NEUMANN, 2001 Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex. Nat. Genet. 29:217-222.[Medline]
JENSEN, M. A., B. CHARLESWORTH, and M. KREITMAN, 2002 Patterns of genetic variation at a chromosome 4 locus of Drosophila melanogaster and D. simulans.. Genetics 160:493-507.
KAPLAN, N. L., R. R. HUDSON, and C. H. LANGLEY, 1989 The "hitchhiking effect" revisited. Genetics 123:887-899.
KIM, Y. and W. STEPHAN, 2002 Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics 160:765-777.
KLEIN, R. G., 1995 Anatomy, behavior and modern human origins. J. World Prehist. 9:167-198.
KONG, A., D. F. GUDBJARTSSON, J. SAINZ, G. M. JONSDOTTIR, and S. A. GUDJONSSON et al., 2002 A high-resolution recombination map of the human genome. Nat. Genet. 31:241-247.[Medline]
LACHAISE, D., M. L. CARIOU, J. R. DAVID, F. LEMEUNIER, and L. TSACAS et al., 1988 Historical biogeography of the Drosophila-melanogaster species subgroup. Evol. Biol. 22:159-225.
LICHTEN, M. and A. S. H. GOLDMAN, 1995 Meiotic recombination hotspots. Annu. Rev. Genet. 29:423-444.[Medline]
MATSUOKA, Y., Y. VIGOUROUX, M. M. GOODMAN, G. J. SANCHEZ, and E. BUCKLER et al., 2002 A single domestication for maize shown by multilocus microsatellite genotyping. Proc. Natl. Acad. Sci. USA 99:6080-6084.
MAYNARD SMITH, J. and J. HAIGH, 1974 The hitch-hiking effect of a favourable gene. Genet. Res. 23:23-35.[Medline]
MELLARS, P., 1998 Neanderthals, modern humans and the archeological evidence for language, pp. 89115 in The Origin and Diversification of Language, edited by N. G. JABLONSKI and L. C. AIELLO. California Academy of Sciences, San Francisco.
PERLITZ, M. and W. STEPHAN, 1997 The mean and variance of the number of segregating sites since the last hitchhiking event. J. Math. Biol. 36:1-23.[Medline]
PRITCHARD, J. K., M. T. SEIELSTAD, A. PEREZ-LEZAUN, and M. W. FELDMAN, 1999 Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Mol. Biol. Evol. 16:1791-1798.[Abstract]
PRZEWORSKI, M., 2002 The signature of positive selection at randomly chosen loci. Genetics 160:1179-1189.
ROZAS, J. and R. ROZAS, 1999 DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15:174-175.
SABETI, P. C., D. E. REICH, J. M. HIGGINS, H. Z. LEVINE, and D. J. RICHTER et al., 2002 Detecting recent positive selection in the human genome from haplotype structure. Nature 419:832-837.[Medline]
SIMONSEN, K. L., G. A. CHURCHILL, and C. F. AQUADRO, 1995 Properties of statistical tests of neutrality for DNA polymorphism data. Genetics 141:413-429.[Abstract]
SLATKIN, M. and T. WIEHE, 1998 Genetic hitch-hiking in a subdivided population. Genet. Res. 71:155-160.[Medline]
STEPHAN, W., T. H. E. WIEHE, and M. LENZ, 1992 The effect of strongly selected substitutions on neutral polymorphism: analytic results based on diffusion theory. Theor. Popul. Biol. 41:237-254.
STRINGER, C., 2002 Modern human origins: progress and prospects. Philos. Trans. R. Soc. Lond. B Biol. Sci. 357:563-579.[Medline]
STROBECK, C., 1987 Average number of nucleotide differences in a sample from a single subpopulationa test for population subdivision. Genetics 117:149-153.
TAJIMA, F., 1989 Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585-595.
TAVARE, S., D. J. BALDING, R. C. GRIFFITHS, and P. DONNELLY, 1997 Inferring coalescence times from DNA sequence data. Genetics 145:505-518.[Abstract]
TENAILLON, M. I., M. C. SAWKINS, A. D. LONG, R. L. GAUT, and J. F. DOEBLEY et al., 2001 Patterns of DNA sequence polymorphism along chromosome 1 of maize (Zea mays ssp. mays L.). Proc. Natl. Acad. Sci. USA 98:9161-9166.
TISHKOFF, S. A., R. VARKONYI, N. CAHINHINAN, S. ABBES, and G. ARGYROPOULOS et al., 2001 Haplotype diversity and linkage disequilibrium at human G6PD: recent origin of alleles that confer malarial resistance. Science 293:455-462.
WALL, J. D., 1999 Recombination and the power of statistical tests of neutrality. Genet. Res. 73:65-79.
WALL, J. D., 2000 A comparison of estimators of the population recombination rate. Mol. Biol. Evol. 17:156-163.
WANG, R. L., A. STEC, J. HEY, L. LUKENS, and J. DOEBLEY, 1999 The limits of selection during maize domestication. Nature 398:236-239.[Medline]
WEBER, J. L., 2002 The Iceland map. Nat. Genet. 31:225-226.[Medline]
WEISS, G. and A. VON HAESELER, 1998 Inference of population history using a likelihood approach. Genetics 149:1539-1546.
WHITT, S. R., L. M. WILSON, M. I. TENAILLON, B. S. GAUT, and E. S. T. BUCKLER, 2002 Genetic diversity and selection in the maize starch pathway. Proc. Natl. Acad. Sci. USA 99:12959-12962.
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Przeworski, M.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Przeworski, M.




