Abstract
We analyze the frequency spectra of all available human nuclear sequence data sets by using a model of constant population size followed by exponential growth. Parameters of growth (more extreme than or) comparable to what has been suggested from mtDNA data can be rejected for 6 out of the 10 largest data sets. When the data are separated into African and nonAfrican samples, a constant size nogrowth model can be rejected for 4 out of 8 nonAfrican samples. Longterm growth (i.e., starting 50–100 kya) can be rejected for 2 out of 8 African samples and 5 out of 8 nonAfrican ones. Under more complex demographic models, including a bottleneck or population subdivision, more of the data are compatible with longterm growth. One problem with the data used here is that a subset of loci may reflect the action of natural selection as well as of demography. It remains possible that the correct demographic model is one of constant population size followed by longterm growth but that at several loci the demographic signature has been obscured by balancing or diversifying selection. However, it is not clear that the data at these loci are consistent with a simple model of balancing selection; more complicated selective alternatives cannot be tested unless they are made explicit. An alternative explanation is that population size growth is more recent (e.g., upper Paleolithic) and that some of the loci have experienced recent directional selection. Given the available data, the latter hypothesis seems more likely.
WITH the world's population now in excess of 6 billion, it is clear that the human population size has not remained constant over time. What is still uncertain is when human populations started to expand in size. Did this happen 50–100 thousand years ago (kya) during the Upper Paleolithic (e.g., Rogers and Harpending 1992; Di Rienzoet al. 1998; Stineret al. 1999) or only recently after the invention of agriculture roughly 12 kya?
The original arguments for earlier growth were based on mtDNA data (Di Rienzo and Wilson 1991; Rogers and Harpending 1992; Sherryet al. 1994). mtDNA data show an excess of rare mutations over equilibrium neutral expectations, which could be a signature of recent population growth (Slatkin and Hudson 1991). Two standard test statistics, D (Tajima 1989a) and D* (Fu and Li 1993), measure whether the observed frequencies of segregating mutations are compatible with the frequencies expected under the standard null model. Some departures from the null model (e.g., recent increases in population size, linkage to a locus under directional selection) lead to an excess of low frequency variants and negative D and D* values, while others (e.g., population subdivision, linkage to a site under balancing selection) tend to cause an excess of intermediate frequency variants and positive D and D* values. The D and D* values for human mtDNA are sharply negative, as are the values for the Y chromosome (Underhillet al. 1997; R. Thompson, J. K. Pritchard, P. Shen, P. J. Oefner and M. W. Feldman, unpublished results).
Nonetheless, mtDNA and the nonrecombining portion of the Y chromosome are but two loci experiencing little (Awadallaet al. 1999) or no recombination, so the effects of demography are confounded with those of natural selection and genetic drift. Recent positive selection at any one site could produce the observed excess of rare mutations at linked neutral sites (Bravermanet al. 1995), as could selection against weakly deleterious mutations (for evidence of purifying selection in the mtDNA, see, e.g., Nachmanet al. 1996; Wiseet al. 1998). If these loci are indeed affected by selection then it is unclear whether the observed D and D* tell us anything about past human population sizes. The effect of natural selection and other confounding factors can be controlled for if multiple independent nuclear loci are considered: selection affects only a small region while the effects of demography are visible throughout the whole genome. There are now data available from a multitude of microsatellite loci (e.g., Di Rienzoet al. 1998; Kimmelet al. 1998; Reich and Goldstein 1998), as well as a handful of nuclear sequence studies (e.g., Hardinget al. 1997; Clarket al. 1998; Harris and Hey 1999).
Microsatellite studies generally find evidence for a more ancient start to population growth, but they differ on the estimates of the time of expansion and the groups involved. For example, Reich and Goldstein (1998) find evidence for population growth in African populations but not in nonAfrican populations, while Kimmel et al. (1998) conclude just the opposite. In addition, the studies consider different scenarios of population growth. Reich and Goldstein (1998) consider a model of sudden population expansion (i.e., a small constant size before some fixed time and a large constant size after this time) and suggest that the lack of a signal of population expansion in nonAfrican populations may have been caused by a bottleneck associated with their initial migration from Africa. In contrast, Kimmel et al. (1998) find support for a model of a bottleneck followed by population growth (but not a model of constant size followed by population growth), while Di Rienzo et al. (1998) examine the extreme scenario of rapid growth from an initially monomorphic population (i.e., a star phylogeny). In addition to these differences, there is some uncertainty in the estimates of mutation rates and in the underlying mutation process, so the estimates of the time since the start of growth might not be reliable. For example, two recent studies conclude that the onset of growth could have been more recent (i.e., 10–20 kya; Pritchardet al. 1999; Gonseret al. 2000).
An additional source of information regarding past population sizes comes from nuclear sequence studies. More than half of nuclear loci have positive Tajima's D values (Przeworskiet al. 2000), so they do not provide evidence for recent population growth (e.g., Hardinget al. 1997; Hey 1997; Zietkiewiczet al. 1998). On the basis of three small data sets Hey (1997) has shown that the difference in D values between mitochondrial and nuclear loci is larger than expected under either a model of constant population size or a model of recent exponential growth. It remains unknown whether the positive Tajima's D values at a larger set of nuclear loci are compatible with the models of population growth proposed from mtDNA and microsatellite data. In this article, we test this question explicitly. We analyze all available studies of human nuclear sequence variation. We focus on whether data are consistent with the predicted effects of population growth and examine whether there are noticeable differences between African and nonAfrican samples. We do not use single nucleotide polymorphism (SNP) data (e.g., Cargillet al. 1999; Halushkaet al. 1999) both because of the high error rate involved in SNP detection when SNPs are not confirmed by sequencing (e.g., Halushkaet al. 1999) and because of possible biases in the frequency spectrum recovered by variant detection arrays (cf. Przeworskiet al. 2000).
We consider a model of constant population size followed by exponential growth (cf. Marjoram and Donnelly 1994) and determine whether the observed D and D* values at nuclear loci are compatible with the distributions of simulated values. We use D because of its use in previous studies (e.g., Hey 1997; Fay and Wu 1999) and D* because simulations suggest that D* is more effective than D in detecting recent population growth (see results). We focus on weak models of growth: a 10fold or 100fold increase in size, unlike the more than 150fold growth of Rogers and Harpending (1992) or the more than 5000fold estimated increase in census population size (cf. Weiss 1984). We do so to be conservative and to maximize the chance that the data are compatible with population growth. We also discuss simulations of more complicated models that include a bottleneck or population subdivision. This discussion is merely qualitative, as distinguishing between more parameterrich models requires more than the simple analyses performed here (Hey and Harris 1999) and more consistently sampled data.
While clearly of interest, the relation of simple demographic models to debates about human origins is unclear. Indeed, theories of human evolution are often too complex or not specific enough to be testable. The single origin model (e.g., Stringer and Andrews 1988) often assumes explicitly that modern human populations expanded in size as they replaced archaic populations (e.g., Rogers and Harpending 1992). For this reason, evidence for longterm exponential growth from mtDNA and microsatellites has been interpreted as support for the single origin model (Harpendinget al. 1998). However, the multiregional hypothesis (e.g., Wolpoffet al. 1984) is compatible with both an early or a late start for population growth. As this article and others demonstrate, nucleotide polymorphism data can fruitfully be used to test specific demographic models. However, no conclusions can be drawn about more general models of human evolution until these are better specified.
MATERIALS AND METHODS
We examine all human nuclear sequence data sets for which frequency data were available; the sample size (n) was at least 10, and the number of segregating sites (S) was at least four. Some studies were excluded (e.g., Hammeret al. 1997; Jinet al. 1999; Petersonet al. 1999) because of biases in the process of data collection (e.g., polymorphisms not uniformly assayed in all individuals). We also exclude all human major histocompatibility complex (MHC) loci, which are likely to be affected by strong selection (either directly or at linked loci). Only biallelic mutations (both point mutations and indels) were included. The findings are essentially the same when overlapping mutations (e.g., in Pdha1, Harris and Hey 1999, or in Dys44, Zietkiewiczet al. 1998) are included (results not shown). Heterozygosity was calculated as π, the (persite) average frequency of pairwise differences (Tajima 1983). For Dys44, the frequencies of alleles were determined from a larger sample (cf. Zietkiewiczet al. 1998), since data from the original ascertainment sample were not available.
We assume a neutral infinitesites model for our simulations. The P values for D and D* were determined directly from simulations that first generate genealogies and then place exactly the number of observed segregating sites on the tree (cf. Hudson 1993). A total of 10^{5} replicates were run for each parameter combination described below. The simulations require some assumption about the population recombination rate C = 4Nr (N is the effective population size and r is the recombination rate per locus per generation). Assuming no recombination (C = 0) is generally conservative for assessing the significance levels of D and D* (Wall 1999), but this assumption may not be appropriate; most nuclear data sets show evidence of recombination (see discussion). Since on the intragenic scale genetic maps based on pedigree data are not very precise, we estimate recombination rates directly from the patterns in the sequence data. We assume a constant rate of crossing over per base pair and no gene conversion. The estimator C_{HRM} summarizes the data using the estimated minimum number of recombination events (R_{M}; cf. Hudson and Kaplan 1985) and the observed number of distinct haplotypes (H) and returns the value of C that maximizes the likelihood lik(CH, R_{M}) (see Wall 2000 for more details). It is roughly unbiased under a constant population size model, has relatively low mean squared error, and can be calculated for large polymorphism data sets (Wall 2000; J. D. Wall, unpublished results). We incorporate population growth (see below) directly into the null simulations used to estimate C. C_{HRM} could not be calculated for Dys44, since haplotype data were not available for that locus, or for Duffy and Dmd7, because the sequence studied was not contiguous. We assume C = 0 for all simulations involving these three loci.
We model either a constant population size or a constant population size followed by exponential growth (cf. Marjoram and Donnelly 1994). For the latter, an equilibrium population of size N = 10^{4} starts at time T to grow at a constant rate to a current population size of 10^{5} or 10^{6}. T, the date of the onset of growth varies from 0 kya to 100 kya, assuming an average generation time of 20 yr.
The recombination rate for each locus is estimated from exponential growth simulations for the whole sample. We run simulations to estimate C_{HRM} for values of T that are multiples of 5 kya and then use linear interpolation to estimate C_{HRM} for other values of T. For Lpl and certain values of T, C_{HRM} could not be calculated because the estimated likelihood of the data was 0. This might be because incomplete phase information was available, leading to an underestimate of the number of distinct haplotypes. When this happens, we estimate C solely from the observed R_{M} [i.e., we take the value of C that maximizes lik (CR_{M})]. All simulations use the same growth rates (for a given value of T), except for a simple correction for Xlinked loci (which have 3/4 the population size of autosomal loci under the standard neutral model). We consider worldwide samples as well as exclusively African and nonAfrican samples. Most simulations were run using modifications of programs kindly provided by R. R. Hudson.
In addition to simulations of a constant population size followed by exponential growth, for Lpl we run simulations of a symmetric island model of geographic subdivision. The model has four islands (meant to correspond loosely to African, European, Asian, and Melanesian populations), and migration rates are taken to correspond roughly to an F_{ST} of 0.15 (cf. Takahata 1983; 4Nm = 3.188, when N = 10^{4}). Actual F_{ST} values between continental populations are often less than this value (CavalliSforzaet al. 1994), but we have opted to be conservative by maximizing the effect population structure has on the distributions of D and D*. Each individual in the sample is assigned to one of the four islands on the basis of their ethnicity. The numbers of sampled individuals from each island are 48 (Africa), 94 (Europe), 0 (Asia), and 0 (Melanesia). T is the same for all demes in simulations that include exponential growth.
Additional simulations consider population size reductions (“bottlenecks”) followed by exponential growth at time T. Stepwise changes in population sizes are straightforward to implement in a coalescent setting (Tajima 1989b). We consider a model of a constant ancestral population size of N = 10^{4}, followed by a reduction in population size to N = 10^{3} lasting 10 thousand years (kyr), followed by exponential growth to a current population size of N = 10^{6}. The time since the start of growth varies from T = 0–100 kyr, and a generation time of 20 yr is assumed. We present only these limited simulations because our interest is in broad qualitative trends.
RESULTS
Table 1 summarizes some general information about the loci considered. Levels of heterozygosity at the loci studied here are comparable with those reported from previous studies (e.g., Li and Sadler 1991). There is no clear trend in the frequency spectra: 7 out of 12 loci have positive D values, while 4 out of 11 loci have positive D* values. When the D values of the largest data sets are compared with each other (cf. Hey 1997), it is found that Xq13.3's value differs significantly from those of Dys44, Pdha1, and βglobin (results not shown).
To illustrate the effect of recent exponential growth on the distribution of D and D*, we choose the largest locus with positive D and D* values (Lpl) and the largest locus with negative D and D* values (Xq13.3). For these two loci, we run simulations where the population size is constant at N = 10^{4}, then at time T starts growing exponentially until it reaches N = 10^{6} at the present. Figures 1 and 2 show the middle 95% of simulated D and D* values, as a function of T. Figure 1, A and B, shows simulations of D and D*, respectively, for Lpl (assuming C = 0), while Figure 2, A and B, illustrates D and D* for Xq13.3 (with C = 0). The actual values of D and D* are highlighted for comparison. As T increases, the expected values of D and D* decrease. Note that the expected value of D* decreases more rapidly than that of D; this suggests that D* is more effective for detecting recent increases in population size. Further simulations confirm this (results not shown). For T ≈ 50–100 kya, as suggested by Rogers and Harpending (1992), Sherry et al. (1994), and others, the observed values of D and D* for Lpl fall outside the 95% confidence interval of simulated values; when realistic recombination rates are assumed, the discrepancy between actual and simulated values is much greater (see below). The values for Xq13.3 are inside the 95% confidence interval for any T (Figure 2A) or any T > 10 kya (Figure 2B) in the interval shown. However, if T ⪢ 100 kya, as in models discussed by Hawks et al. (2000), even the D and D* values for Xq13.3 are significantly too high (e.g., for T = 2 mya and 600fold growth, P < 0.01 for both).
We quantify the effect of T on D and D* for other loci by determining for which values of T the actual values of D and D* lie within the middle 95% of simulated D values. Unlike above, these simulations use a recombination rate that is estimated from the data (see materials and methods). This is shown in Table 2, for an ancestral population size of N = 10^{4} and current population sizes of N = 10^{5} or N = 10^{6}. For 10fold growth in population size, four loci are inconsistent with exponential growth starting 50 kya. For 100fold growth, six are inconsistent with T = 50 kya and nine with T = 100 kya. In contrast, two loci are inconsistent with T = 0. (Note that we are not correcting for the use of two test statistics. If we do, the qualitative conclusions are unchanged.)
One of the main conclusions to emerge from studies of human variation is a greater variability in Africa (e.g., Cannet al. 1987; Bowcocket al. 1994; Halushkaet al. 1999). To determine whether there is a geographical component to the patterns observed, we partition our data sets into African and nonAfrican samples. The sampling scheme for nonAfricans varies greatly, from 34 chromosomes from one population (Duffy) to one or few individuals from dozens of populations (Xq13.3). All nonAfrican samples include some Europeans; Dys44, Xq13.3, Dmd44, βglobin, and Dmd7 include Asian populations as well. For each, we run the same exponential growth simulations as before. Table 3 summarizes the data sets and shows the results of these simulations for the eight loci that provided geographic information and satisfied minimal size requirements (n ≥ 10 and S ≥ 4 in both samples). While our findings confirm the observation of higher levels of polymorphism in African vs. nonAfrican populations, other systematic differences seem more difficult to identify. The D values for nonAfrican samples are generally higher than the corresponding D values for African samples (true for 6 out of 8 loci), but perhaps more interesting is that four of the nonAfrican samples (but none of the African samples) have significantly high D values even when there is no growth (i.e., T = 0). The P values for these four data sets become vanishingly small under longterm exponential growth (i.e., T = 50 kya). In contrast, there seems to be no systematic difference in D* values between African and nonAfrican populations, and one African and one nonAfrican sample show the opposite pattern of significantly negative D values when T = 0. Overall, five out of eight nonAfrican and two out of eight African samples are inconsistent with a model of 100fold growth starting 50 kya.
A model of constant population size followed by exponential growth is probably too simplistic. With the inclusion of additional features, such as a population bottleneck or population subdivision, more data are compatible with an older onset of growth. We highlight this by examining how alternative demographic assumptions affect the distribution of D and D* values for the total Lpl data set. Figure 3 shows the middle 95% of simulated D and D* values for a model of a population bottleneck followed by exponential growth, as a function of the time since the end of the bottleneck (see materials and methods). Figure 3A shows simulated Tajima's D values, while Figure 3B shows simulated Fu and Li's D* values. The actual values are highlighted for comparison. The specific parameters used were chosen to maximize the chance that the observed D and D* would be compatible with longterm exponential growth. As can be seen by comparing Figure 1 with Figure 3, the Lpl data set is now compatible with an older onset of growth (roughly 46 kya instead of 25 kya for D and 7 kya instead of 3 kya for D*). If recent population growth is assumed, then the effect of a bottleneck before the start of growth decreases as the sample size increases (results not shown). Anatomically modern humans are thought to have reached Australia roughly 50–60 kya (Robertset al. 1994). If T > 50 kya, stronger bottlenecks will result in lower values of D and D* (results not shown).
The presence of population structure often leads to higher expected D and D* values. To test the magnitude of this effect, we consider an island model of geographic subdivision (see materials and methods). Figure 4 shows the middle 95% of simulated D (Figure 4A) and D* (Figure 4B) values for Lpl. As implemented, geographic subdivision has a relatively minor effect, despite the low migration rate used (see materials and methods); the range of compatibility increases from 0–25 kya in Figure 1A to 0–37 kya in Figure 4A and from 0–3 kya in Figure 1B to 0–6 kya in Figure 4B. If an equilibrium island model with unequal island sizes is used (cf. Relethford and Harpending 1995; Relethford and Jorde 1999), the effect on D and D* is almost the same (results not shown).
DISCUSSION
Nuclear sequence data conflict with other genetic loci: In this article we examined the frequency spectrum of segregating mutations at nuclear loci in humans. mtDNA and Y chromosome data show a substantial excess of rare mutations (i.e., D and D* are strongly negative) over equilibrium neutral expectations (Cannet al. 1987; Di Rienzo and Wilson 1991; Underhillet al. 1997). Researchers have argued that the sharply negative D value reflects an expansion in population size that occurred ~50–100 kya (e.g., Rogers and Harpending 1992; Sherryet al. 1994; Rogers 1995). If so, we would expect to observe the effects of this expansion throughout the genome (Hey 1997). However, the nuclear data are not consistent with this scenario, even though our most extreme model of growth (100fold growth over the past 100 kyr) is still less extreme than has been commonly proposed (e.g., Weiss 1984; Rogers and Harpending 1992). While expansion should lead to predominantly negative D and D* values, Tables 1 and 3 show a roughly equal number of positive and negative values. Since D and D* measure different (but related) aspects of the data, we expect the ranges of compatible T values for D and D* to be correlated but not identical in Tables 2 and 3. This is what we observe: For some loci, the D and D* values are large enough such that they are inconsistent with a model of exponential growth starting >50 kya. This is the case for 6 out of 12 loci in Table 2, two out of eight African samples, and five out of eight nonAfrican samples (cf. Table 3). Contrary to the claim of Jorde et al. (2000), all available data do not support the longterm population growth model first suggested by mtDNA data. The discordance between the data and a longterm growth model is even larger when previously proposed parameter values (i.e., growth rates and time since the start of growth) are used (e.g., Rogers and Harpending 1992; Kruglyak 1999).
Estimating the rate of recombination: One criticism of our conclusions is that significance levels are not necessarily conservative when recombination rates are estimated from the data. However, there is no reason to expect that C_{HRM} consistently overestimates the true recombination rate, and our null simulations explicitly incorporate the model of population growth that is tested. Constantsize coalescent simulations with fixed values of C suggest that the median of the distribution of C_{HRM} values is usually less than or equal to the actual value of C (results not shown). In addition, some aspects of the data at many loci (in particular, the nonzero and sometimes large values of R_{M}) are not consistent with low recombination rates (results not shown). We know that recombination is operating throughout the autosomes and X chromosome, and ignoring this fact might be problematic. In particular, it might not be appropriate for researchers with nuclear sequence data to assume that C = 0 and reconstruct a tree, since a post hoc pruning of the data (i.e, removing sites and individuals that show evidence of recombination) might bias the results. More important, even if we make the unrealistic assumption that C = 0, the qualitative results are the same: all of the loci in Tables 2 and 3 that are inconsistent with 100fold growth and T = 50 kya are still inconsistent if we assume no recombination (results not shown).
Possible explanations: Even though nuclear sequence data do not support a simple model of recent population growth, we nonetheless know that a drastic population expansion occurred at least 12 kya with the advent and spread of agriculture. Furthermore, archaeological evidence suggests that human population sizes have expanded over the last 40–50 kyr or more (Klein 1989). So why the discrepancy between expectations and observations? Three nonexclusive possibilities are that our mutational model is incorrect, that our demographic model is incorrect, or that the patterns of variation at some of the loci have been affected (either directly or indirectly) by natural selection.
All of our simulations have assumed an infinitesites model, and some researchers have recently suggested that multiple mutations at the same site might be frequent for human polymorphism data (Templetonet al. 2000). However, this is likely to be a minor concern. For example, for Lpl (the data set with the most segregating sites), the expected number of multiple hits at CpG sites is less than two (taking the CpG mutation rate estimated for Lpl in Templetonet al. 2000). The effect that two multiple hits would have on D and D* is negligible (results not shown). The expected number of multiple hits in smaller data sets is less than what it is for Lpl (results not shown).
Although our simplistic demographic model is likely to be incorrect, the relevant question is whether actual human demography differs from our assumptions in ways that would lead to systematically higher D and D* values. We tested two possible alternative models in Figure 3 (population bottleneck) and Figure 4 (population structure). Although both models tend to produce higher D and D* values (and thus greater concordance between our data and a model of recent population growth), neither is a sufficient explanation for all of the loci examined. Some loci (e.g., βglobin) still have D values that are too high. More generally, the low D and D* values at some loci (e.g., Xq13.3 and Dmd7) and the high values at other loci (e.g., Lpl and βglobin) are not both consistent with any simple model of human demography (see also Hey 1997). So it seems likely that selection has influenced the patterns of variation at several of the loci studied. One locus in particular (Duffy) is known to be influenced by natural selection (which may explain why the African sample has less diversity than the nonAfrican sample; Hamblin and Di Rienzo 2000). Any model of human history should also include claims of how and where natural selection has affected observed genetic variation. Below we examine two main hypotheses for which loci and what types of selection have been operating.
One possibility is that there has indeed been longterm population growth (e.g., T > 50 kya). In this case, the excess of rare variants in mtDNA, the Y chromosome, Xq13.3, and Dmd7 reflects demography while the high D and D* values at Lpl, Dys44, Pdha1, and βglobin reflect the action of balancing or diversifying selection. The intermediate D and D* values at other loci such as Ace or Dmd44 could then be due to chance or to demographic factors such as population structure or a bottleneck. (But note that these factors are still insufficient to account for the extremely high D values at some loci.) Although a simple model of balancing selection (e.g., Hudson and Kaplan 1988) leads to higher D and D* values (Fu 1996), it also predicts a welldefined peak of polymorphism surrounding the selected site. There is neither a putative selected site nor an observable peak of polymorphism at Lpl, Dys44, Pdha1, or βglobin. Theory predicts that it takes a substantial amount of time for balancing selection to affect levels of polymorphism or Tajima's D values, so young balanced polymorphisms (e.g., malaria resistance at βglobin) should have little effect on levels of polymorphism. Note also that there are few if any examples of balanced polymorphisms in any species aside from MHC loci in mammals and Sallele systems in plants. Even the canonical case of Adh in Drosophila melanogaster may not be a simple balanced polymorphism (Begunet al. 1999). Thus, it seems unlikely that balancing selection has led to higher D and D* values in multiple unlinked human nuclear loci. Other selective models, such as local adaptation, might produce higher D and D* values; however, they have not been well characterized theoretically.
An alternative hypothesis is that the positive (and slightly negative) D and D* values reflect demography, while the significantly negative D and D* values for mtDNA, the Y chromosome, Xq13.3, and Dmd7 reflect the recent effects of directional selection. It is an interesting coincidence that three out of four of these are in areas of little or no recombination. The fourth locus, Dmd7, shows only an excess of rare variants outside of Africa, so it cannot be taken as support for the simplest model of growth. Kaessmann et al. (1999) deliberately chose their region (Xq13.3) to be in an area of reduced recombination because data is easier to analyze when recombination can be ignored. They suggest that loci like Xq13.3 are “ideally suited for unravelling the evolutionary history of the nuclear genome” (Kaessmannet al. 1999, p. 79). However, one consequence of their choice of location is that the patterns of variation at Xq13.3 (as with mtDNA and the nonrecombining region of the Y chromosome) are especially vulnerable to the effects of selection at linked loci. Positive selection at a linked locus (Smith and Haigh 1974), possibly outside the region examined, would lead to a reduction in heterozygosity and a shift toward negative D values at Xq13.3 (Bravermanet al. 1995). There are many nearby candidates for selection. A Grail search (http://compbio.ornl.gov) revealed a putative gene (a purinergic receptor) <5 kb away (0.0007 cM) from the region sequenced by Kaessmann and colleagues. Given the available data and the greater prevalence of directional selection compared with balancing selection in Drosophila (Hey 1999), this hypothesis seems more plausible.
Implications for models of human evolution: The single origin model implicitly assumes that modern human populations expanded as they replaced more archaic hominids throughout the Old World (Harpendinget al. 1998). This expansion presumably happened before the colonization of Australia 50–60 kya (Robertset al. 1994). The challenge for proponents of the single origin model is to formulate a model that includes longterm population growth and that can explain why observed F_{ST} values are low and why there is no trend toward negative D and D* values in nuclear loci. Any claim about the action of selection at certain loci should be made explicit, considering the discussion of balancing selection above. As mentioned before, our results on the timing of recent population expansions do not directly impact the feasibility of the multiregional model. However, it is still unclear whether an ancestral effective population size of 10^{4} is consistent with a continuous occupation of most of the Old World (see, e.g., Harpendinget al. 1998). Perhaps a less polemic goal would be to construct a model that can account for the differences between African and nonAfrican samples. Such a model would need to explain why diversity levels within Africa are consistently higher than outside of Africa, and why the D values for nonAfrican samples at some loci are significantly positive. Further work focuses on whether nonequilibrium demographic models are more consistent with human nuclear sequence data.
Acknowledgments
B. Payseur, C. Sing, and E. Zietkiewicz generously provided unpublished data, and A. Di Rienzo, M. Hamblin, R. Harding, M. Nachman, and J. Pritchard provided preprints of their work. Also, we thank P. Andolfatto, A. Di Rienzo, R. Hudson, M. Nordborg, N. Takahata, and two anonymous reviewers for helpful discussions and comments on earlier versions of this work. Part of this paper was completed when J.D.W. was at the Graduate University for Advanced Studies (Hayama, Japan), supported by the Monbusho Summer Program in Japan. J.D.W. was partially supported by National Institutes of Health grant 5R01H610847.
Footnotes

Communicating editor: N. Takahata
 Received November 12, 1999.
 Accepted April 21, 2000.
 Copyright © 2000 by the Genetics Society of America