Abstract
A debate of longstanding interest in human evolution centers around whether archaic human populations (such as the Neanderthals) have contributed to the modern gene pool. A model of ancient population structure with recent mixing is introduced, and it is determined how much information (i.e., sequence data from how many unlinked nuclear loci) would be necessary to distinguish between different demographic scenarios. It is found that ~50–100 loci are necessary if plausible parameter estimates are used. There are not enough data available at the present to support either the “single origin” or the “multiregional” model of modern human evolution. However, this information should be available in a few years.
THE question of modern human origins has fascinated physical anthropologists for decades. Recently, two main hypotheses have been proposed. One view, often called the single origin model, claims that modern humans evolved in a single location (probably in Africa) roughly 150,000 years ago and from there expanded and replaced the existing hominid populations around the world (e.g., Howells 1976; Stringer and Andrews 1988). In contrast, the multiregional model claims no single location for the origin of modern humans; instead, it posits that over the last 1–2 million years humans have evolved gradually from their Homo erectus ancestors throughout the Old World (e.g., Weidenreich 1943; Wolpoffet al. 1984). Both theories claim support from the fossil record (e.g., Wolpoffet al. 1984; Stringer and Andrews 1988), and it is hard to sort out their conflicting claims. However, recent fossil studies tend to support the single origin hypothesis (e.g., Lahr 1994, 1996; Hublinet al. 1995; but see Duarteet al. 1999), as do recent genetic studies (e.g., Vigilantet al. 1991; Armouret al. 1996; Tishkoffet al. 1996). Despite this trend, there are still vocal supporters of the multiregional model (e.g., Wolpoff 1996).
One way of rephrasing the debate is to ask how much contribution regional “archaic” populations have had to current human morphological and genetic diversity. Under the single origin model, this contribution is thought to be very small or nonexistent, whereas the multiregional model predicts that it is quite large. Other intermediate hypotheses predict a range of contributions from relatively small (e.g., Bräuer 1984) to relatively large (e.g., Smith 1992).
One recent genetical breakthrough involved the recovery and sequencing of a fragment of Neanderthal mtDNA. The inferred sequence was found to be quite different from the homologous sequences of all extant humans (Kringset al. 1997). This observation has been taken as evidence for the claim that modern humans and Neanderthals did not interbreed, and thus that Neanderthals left no descendants among extant humans (e.g., Kahn and Gibbons 1997; Lindahl 1997; Ward and Stringer 1997). However, it is impossible to make any strong inferences from single locus data because of the randomness inherent in the underlying evolutionary process. In fact, the mitochondrial data are consistent with a high level of interbreeding between humans and Neanderthals (Nordborg 1998). Nordborg calculates that even if Neanderthal mtDNA comprised 25% of the ancestry of extant human mtDNA roughly 68,000 years ago, there would still be a >50% chance that by the present time all Neanderthal mtDNA would have been lost because of genetic drift. To have any power, we must have data from multiple unlinked loci.
In this article, I explore how much power we have to exclude the possibility that Neanderthals (or other archaic humans) left descendants among extant humans given sequence data from multiple unlinked neutral nuclear loci. I assume that no further loci from Neanderthals will be sequenced; given the difficulty encountered with ancient mtDNA, which is present in hundreds of copies in each cell, it is highly unlikely that any single copy nuclear DNA can be recovered from fossils as old as Neanderthals. Instead, I focus on the effect of past demographic events on the patterns of current observed variation in human populations. This can be done by examining the change in the shape of the genealogy caused by alternative demographic scenarios. Both changes in population size and the presence of population subdivision produce characteristic changes in the shape of genealogies when they are compared with genealogies generated from a constant size panmictic model (e.g., Takahata 1988; Tajima 1989a; Slatkin and Hudson 1991; Marjoram and Donnelly 1994). For example, Nordborg (2000) proposes a model that could characterize Neanderthal admixture. Consider two populations that diverged at time T_{1} in the past and remain completely isolated until time T_{0}, at which point they merge into a single panmictic population in proportions c and 1 − c (see Figure 1). The genealogy in Figure 1 differs from a panmictic one in that the amount of time when there are two ancestors is much larger than its expectation under panmixia. Nordborg (2000) shows that under some extreme admixture scenarios, panmixia can be rejected because this time is too long.
Nordborg notes that his results are not conservative (i.e., the power to reject panmixia with actual data is less than in his simulations). This is because he assumes that the genealogy and branch lengths are known. In practice, they must be inferred from observed sequence data. This inference is especially difficult when there is intragenic recombination. I deal with the problem of intragenic recombination by examining patterns that are directly observable in sequence data. The model assumes that there are data from many unlinked neutral loci. Unlinked loci essentially provide independent runs of the evolutionary process, because demographic history is expected to equally affect the whole genome. Another advantage of considering multiple unlinked loci is that we can minimize the confounding effect of local selection.
I introduce ad hoc test statistics based on measures of inferred linkage disequilibrium or inferred genealogical tree shape, and I examine their powers to reject the null model (of panmixia) when the actual demography is as in Figure 1. More effective measures likely exist. However, my goal was to construct reasonable upper bounds for how much information (i.e., how many loci) would be necessary to distinguish between various demographic hypotheses. Fulllikelihood methods were not considered. Algorithms that include intragenic recombination have only been implemented for panmictic models (e.g., Griffiths and Marjoram 1996), and even these are not computationally feasible for the recombination rates considered here (results not shown).
An outline of the remainder of the article is as follows. First, the model and test statistics are introduced. Then, the test statistics are compared with each other, and the best one is chosen for further simulations. These simulations are parameterized to model Neanderthalmodern human admixture in Europe ~30,000–45,000 years ago (e.g., Mercieret al. 1991; Hublin et al. 1995, 1996) and more speculative admixture between modern humans and H. erectus remnants in Southeast Asia (Robertset al. 1994; Swisheret al. 1996). Finally, shortcomings of the model are discussed, as are general issues regarding demographic inference from current sequence data.
MODEL AND TEST STATISTICS
To model archaic admixture, I use an infinitesites coalescent model with recombination (cf. Hudson 1983, 1990). I assume there is no selection and a constant rate of recombination per base pair per generation. Consider a panmictic population with diploid effective population size N. Going backward in time, suppose that at time T_{0} the remaining ancestors are placed randomly into one of two subpopulations with probabilities c and 1 − c. From time T_{0} to time T_{1}, these subpopulations are assumed to be completely isolated from each other and panmictic with diploid effective population sizes of cN and N, respectively. [Replacing cN by N or N by (1 − c)N did not change any of the results.] Then, at time T_{1}, all remaining lineages are placed into a single panmictic population with diploid effective population size N. All further coalescences occur in this panmictic population. The null model is c = 0, which yields constantsized panmixia throughout time (i.e., no contribution of archaic populations to the modern gene pool). T_{0} and T_{1} are scaled in units of 2N generations. This model is similar to that proposed by Nordborg (2000), except that it includes intragenic recombination. One possible problem with the model is that the mixing event at time T_{0} is assumed to be instantaneous, whereas possible interbreeding between modern and archaic human populations might have been spread over thousands of years. However, the instantaneous approximation is reasonable because few coalescent events are expected to occur during the time of contact and because only small values of c are considered.
The data simulated using the above scheme are in the standard population genetic format of an ordered list of segregating sites. Call a pair of segregating sites congruent if the subset of the data consisting of the two sites contains only two different haplotypes. Congruence is a pairwise measure of linkage disequilibrium; it has been shown to be effective in detecting certain types of population structure (Wall 1999). If T_{1} ≫ T_{0}, then at time T_{0} there will be many sites that are fixed for alternative alleles in the two subpopulations. This will show up in data as a set of sites that is dispersed throughout a locus and is pairwise congruent. (A set of sites is pairwise congruent if all possible pairs in the set are congruent.) Over time these associations will decay, but they serve as the motivation for the following test statistics.
Label the segregating sites 1, 2, …, S in order, and suppose 1 ≤ i ≤ j ≤ S. Define the indicator variable
Nordborg (2000) uses the total time when there are exactly two ancestors as a test statistic. This is similar to using the longest branch length in the unrooted tree as a test statistic, because the longest branch is often the oldest one as well. (This correspondence is less meaningful when recombination causes different sites to have different genealogies.) As mentioned before, tree topologies and branch lengths must be estimated from the data. One measure with similar motivation is
One final test statistic considered is Tajima's (1989b) D (denoted here by d_{t}), a commonly used measure of the skew in the observed frequency spectrum. When c is small, mutations on the oldest branch will tend to be at low frequency (assuming there is no outgroup), leading to a shift toward negative d_{t} values. Because the multilocus likelihood calculations described below require discrete variables, each d_{t} value was rounded down to the nearest tenth.
Suppose an alternative model Alt = (c, T_{0}, T_{1}, θ, C) is specified. (Here θ = 4Nμ is the population mutation parameter and C = 4Nr is the population recombination parameter.) It is compared with a null model Null = (c = 0, T_{0}, T_{1}, θ, C). The θ's for the two models are chosen so that the average pairwise divergences (π; cf. Tajima 1983) for the simulated data sets have roughly equal expectation. Furthermore, we assume that θ = C for each model. (This assumption is relaxed in the discussion.) Recent highresolution linkage maps (e.g., Bouffardet al. 1997; Nagarajaet al. 1997) and studies of human genetic variation (e.g., Hardinget al. 1997; Zietkiewiczet al. 1997; Nickersonet al. 1998; Harris and Hey 1999) suggest that this is a reasonable approximation. Note that the actual values of T_{0} and T_{1} are irrelevant for the null model.
We can estimate the distribution of the seven test statistics by simulation. A statistic will be effective when its distribution under the alternative model of archaic admixture is noticeably different from its distribution under the null model of equilibrium panmixia. For any value of k, set Pr_{Null} (g = k) as the probability that g = k if the null model is simulated. (Pr_{Alt} is defined similarly.) This probability can be estimated from the proportion of simulated trials where g = k. In general, 10^{6} trials were considered sufficient for each combination of parameter values. Now, suppose that data is available from x independent loci, each simulated according to either the null model or the alternative model. Define g(j) = the value of g from the jth locus, and
RESULTS
Preliminary simulations were run to estimate the optimal sample size, locus size, and test statistic. It was assumed that there is a constant per base pair cost for sequencing; under the constraint of a fixed total cost, it was determined what combination of n, θ, number of loci, and test statistic led to maximal power to reject the null model. These simulations were meant to be exploratory, not exhaustive. They suggest that the optimal configuration has a relatively small sample size (n ~ 20) and a relatively large locus size (θ ~ 10), and that power is maximized using G_{r} (results not shown). All further simulations fixed θ = 10 and n = 50. This corresponds to roughly 8–10 kb of sequence. The larger than optimal sample size was taken to conform more closely to the large sample sizes considered in most recent studies of human genetic variation (e.g., Hardinget al. 1997; Zietkiewiczet al. 1997; Nickersonet al. 1998). As an illustration, Figure 2 shows the power (in percentage) vs. the number of loci for all six test statistics. The other parameters are c = 0.1, T_{0} = 0.1, and T_{1} = 1.0. If we make the standard assumptions of N = 10^{4} and a generation time of 20 years, then T_{0} is 40,000 years ago and T_{1} is 400,000 years ago. G_{r}, L_{b}, and G are all reasonably powerful with ~40 or more loci, while the others remain weak even when a large number of loci are considered. G_{r} is slightly more powerful than G because it weights those congruent pairs that are “far” (as measured by evidence of recombination) over those that are “near.” The former are much more unusual under the panmictic equilibrium null model.
The most powerful test statistic (G_{r}) was chosen for more extensive simulations that sought to determine how much information (i.e., how many loci) would be required to distinguish between plausible human demographic scenarios. The results are described first for possible Neanderthal admixture in Europe and second for possible H. erectus admixture in Southeast Asia.
Recent archaeological evidence suggests that modern humans first entered Europe roughly 45,000 years ago, and that remnant Neanderthal populations remained until ~25,000–30,000 years ago (e.g., Mercieret al. 1991; Hublin et al. 1995, 1996; Lahr 1996; Duarteet al. 1999). Making the same approximations of N = 10^{4} and a generation time of 20 years yields the estimate T_{0} ~ 0.07–0.11. It is more difficult to make a reasonable guess for T_{1}. This is due to both the multiple interpretations of the fossil record and the fact that genetic isolation might be only loosely correlated with morphological divergence. Foley and Lahr (1997) suggest that the divergence may be relatively recent (i.e., roughly 250,000 years ago), while others have postulated that it is far older (e.g., Carbonellet al. 1995; Hublin 1998). I simulated T_{1} values from 0.6 to 2.0 (increment 0.2), which cover the whole range of possibilities. Also, T_{0} values from 0.06 to 0.2 (increment 0.02) were considered. Results are shown in Figures 3 and 4. Figure 3 has c = 0.1 while Figure 4 has c = 0.02. For both, the first graph shows the power to reject the null model when there are 60 loci, and the second shows the power when 250 loci are considered. Not surprisingly, more extreme subdivision (i.e., smaller T_{0} values or larger T_{1} values) leads to greater power. For the genealogy of a given site to look like Figure 1, at least one lineage must be present in both subpopulations. For the parameters considered here (i.e., c ≪ 1 and many expected ancestors at time T_{0}), representation in the nonarchaic subpopulation is certain. So, a site's genealogy will appear as in Figure 1 if and only if it contains archaic ancestry. Higher values of c lead to a greater proportion of sites having archaic ancestry and hence greater power. These same qualitative results were noted by Nordborg (2000). The probability that a given site has archaic ancestry depends on both c and the number of ancestors present at time T_{0}. As T_{0} increases, there are fewer expected ancestors at time T_{0} and thus a smaller probability of a particular site having archaic ancestry and less power to reject the null model. Alternatively, as T_{1} increases, those sites with archaic ancestry will have a longer oldest branch; this will lead to more segregating sites in strong linkage disequilibrium with each other and greater power to reject the null model. Figure 3a shows that 60 loci are sufficient to distinguish between high levels of Neanderthal admixture and panmixia in most situations, whereas Figure 4a suggests that the power to detect lower (possibly more realistic) levels of admixture depends much more on specific parameter assumptions. In particular, if Neanderthals were genetically isolated only recently (cf. Foley and Lahr 1997), then high power to reject the null model might require on the order of hundreds of loci.
With the recent claim that H. erectus survived on Java until just 25,000 years ago (Swisheret al. 1996), there has arisen the intriguing possibility that H. erectus and modern humans might have been contemporaneous in Southeast Asia. Modern humans first colonized Australia ~60,000 years ago (Robertset al. 1994), and due to geographical constraints they must have at least temporarily occupied Southeast Asia. This time of possible contact corresponds to T_{0} values of 0.06–0.15 or more. H. erectus first colonized East Asia almost 2 mya (e.g., Swisheret al. 1994; Huanget al. 1995), and the marked morphological conservatism in the Southeast Asian H. erectus fossil record is suggestive of a high degree of genetic isolation. The 1.6–1.8 mya dates of Swisher et al. (1994) suggest an upper bound for T_{1} of ~4.0–4.5. Figure 5 shows the results of power simulations where T_{0} varies from 0.06 to 0.20 (increment 0.02) and T_{1} varies from 1.0 to 5.0 (increment 0.5). To reflect the knowledge that any contribution of H. erectus to the modern gene pool is likely to be minor, a low value of c (c = 0.005) was assumed. Figure 5a shows power with 60 loci, while Figure 5b shows power with 250 loci. If H. erectus were isolated for most of its existence in Southeast Asia (e.g., T_{1} ~ 3 or more), then there would be reasonable power with 60 loci and very high power with 250 loci even when T_{0} is large; those loci with archaic ancestry would appear extremely unusual, and this would more than counteract the low value of c. However, if there had been intermittent gene flow then lower values of T_{1} would be appropriate; if so, there would be reduced power, because the vast majority of sites with no archaic ancestry would swamp out the pattern from other sites.
DISCUSSION
Figures 3, 4 and 5 show that the number of loci needed to have reasonable power to reject the null model is quite sensitive to assumptions about when and how long the archaic population was isolated and how much of a contribution it made to the modern gene pool. So, while as few as a dozen loci might be sufficient for extreme demographic scenarios, it seems that 60 or more loci would be necessary if parameter values are close to our best guesses for them. Furthermore, some demographic scenarios will be impossible to distinguish from equilibrium panmixia even if the entire genome is sequenced in every living human. If the contact between archaic and modern populations happened too far in the past (i.e., if T_{0} is too large) and/or the archaic contribution was too small (i.e., if c is too small), then genetic drift in the intervening years could easily obscure whatever pattern there once was. One conclusion is that genetic sequence data can never exclusively support a single origin model with complete replacement, because very low levels of hybridization with the local archaic populations (thus very few sites with archaic ancestry) can never be ruled out. Another is that there is not enough human sequence information available at the present for us to make any real demographic inferences. As of July 1999, there are only four published nuclear data sets of reasonable size (i.e., n > 10 and θ > 5) for which haplotype information is available (Lpl, Nickersonet al. 1998; Pdha1, Harris and Hey 1999; Xq13.3, Kaessmannet al. 1999; ACE, Riederet al. 1999; I exclude MHC loci, which are known to be under strong diversifying selection). Sufficient data will probably be collected in the next 5–10 years, and until then it is probably most appropriate to take an agnostic attitude toward the debate on whether archaic populations have contributed to the modern human gene pool.
Even though very low values of c (e.g., c < 0.001) might be impossible to distinguish from c = 0 by the methods described here, they could still produce patterns in the data that would be highly suggestive of population structure. Lower c values imply a higher proportion of sites that conform to an equilibrium panmictic model because they have no archaic ancestors. However, conditional on a given site containing archaic ancestry, the expected length of sequence sharing this ancestry is quite large. For example, when T_{0} = 0.1 and n = 50, the expected length is roughly the distance over which C ~ 30 (i.e., ~20–30 kb; results not shown). Also, the expected frequency of archaic variants (conditional on their existence) is roughly one over the expected number of ancestors at time T_{0} (~7% when T_{0} = 0.1), so is not vanishingly small (results not shown). If T_{1} is large (e.g., T_{1} > 3 in the case of Southeast Asian H. erectus ancestry), then areas of the genome with archaic ancestry would contain an unusually large number of reasonably rare segregating sites in high linkage disequilibrium with each other. These sites would not be tightly bunched, but would be dispersed over several kilobases of sequence.
I have focused on the power to reject c = 0 when the actual situation is c > 0. It would be just as easy to use multilocus data to come up with a point estimate of c (i.e., an estimate of what proportion of our genome came from archaic human populations). Suppose M(c) = (c, T_{0}, T_{1}, θ, C) is a class of models where all but the first parameter are fixed, and g_{r}(j) is the value of g_{r} from the j^{th} locus. Then one could estimate c by taking the value of c that maximizes
The model considered in this article is somewhat simplistic; it assumes that modern humans (and most of their direct ancestors) form a constantsized panmictic population, that the archaic population is formed and dissolved instantaneously, and that the archaic population is completely isolated for the duration of its existence. Some of these assumptions are known to be false. Below I discuss some possible shortcomings of the model assumptions and how more realistic models would change the general conclusions reached here.
Complete isolation of the archaic population: It has been suggested that multiple dispersal events might be a common pattern in human evolution (Lahr and Foley 1994; Lahr 1996). Also, the multiregional model posits low levels of gene flow between all hominid populations (e.g., Wolpoffet al. 1984). One way of incorporating this into the model is to include a (presumably low) migration rate between the two subpopulations. Because migration has a homogenizing effect, increasing the migration rate leads to a less extreme subdivision model and less power to reject the null model. Scenarios that include a reasonable amount of migration are difficult to distinguish from the null model because they are in actuality very similar to panmictic models.
Constant population size: The recent population explosion highlights the fact that human population sizes have changed over time. Some researchers have postulated that human population sizes were small in the distant past and then started to grow rapidly during the upper Paleolithic (e.g., Rogers 1995; Stineret al. 1999). This has been modeled by constant size followed by exponential growth (e.g., Marjoram and Donnelly 1994). One aspect of exponentially growing populations is that coalescent events tend to be bunched around the time of the onset of growth. If the onset of growth precedes T_{0}, then a sample will tend to have more ancestors present at time T_{0} than it would in the constantsized case, which leads to a greater proportion of sites with archaic ancestry and greater power to reject the null model. If, on the other hand, T_{0} is further back in time than the exponential growth phase, then power depends on the scaled value of T_{0}, which in turn depends on assumptions about N. The power in this situation is very close to the power for the comparable constantsized population case. Thus, the assumption of a constant population size is conservative with respect to recent population growth.
Panmixia: The human population is not panmictic. Genetic studies have consistently found differences between regional populations (e.g., Vigilantet al. 1991; CavalliSforzaet al. 1994; Hardinget al. 1997; Harris and Hey 1999). In general, any form of geographic subdivision tends to produce an excess of linkage disequilibrium between alleles and could lead to spurious rejection of the null model (i.e., the null assumption of panmixia in the modern human population is not conservative with respect to underlying population structure). This problem might be avoided if a suitable model of geographic subdivision could be used as the null model. For example, an island model could be considered with migration rates determined from observed F_{st} values between populations. Under an island model, the archaic contribution to the modern gene pool would likely be geographically localized. The actual population structure in humans is probably quite complex, and it remains to be shown whether simple models (e.g., island models) are appropriate substitutes.
Uncertainty in C: This uncertainty can take two forms. First, the ratio C/θ may be greater than one. As this ratio increases, there will be less linkage disequilibrium and less correlation in the genealogies of nearby segregating sites and thus less useful information for making demographic inferences. However, as long as C is known (so we can condition on how much linkage disequilibrium is expected), there is no loss of power. For example, if Figure 3a is rerun with C = 20 (instead of C = 10) everywhere, the power to reject the null model is essentially unchanged (results not shown).
A more serious concern is that regional variation in recombination rates might cause C to be different for each locus. If so, the test statistics described here could be used only if C could be measured accurately for each locus. This would require linkage maps with ~10fold finer resolution than Bouffard et al. (1997) and Nagaraja et al. (1997), because current methods of estimating C from extant sequence data (e.g., Hudson 1987; Griffiths and Marjoram 1996; Hey and Wakeley 1997) are all biased and have large variances (results not shown). One possible solution would be to choose the most conservative recombination rate for each locus [i.e., for locus j the one that maximizes Pr_{Null} (g_{r} = g_{r}(j))/Pr_{Alt} (g_{r} = g_{r}(j))]. Another option would be to construct a test statistic whose definition does not depend on an assumed recombination rate. The loss in power by either method is substantial. An example of the latter type of test statistic is
Nature of recombination: The model used here includes crossing over but not gene conversion. The incorporation of gene conversion might lead to higher values of g_{r}; if a conversion tract is surrounded on both sides by regions with the same genealogy, then it is more likely that one observes two congruent sites with evidence of recombination between them. To avoid this problem, either (1) the null model could be modified to include intragenic recombination or (2) test statistics of similar power (e.g., l_{b} or g; see Figure 2) could be used that are likely to be less sensitive to assumptions regarding recombination.
Even though there are not yet enough genetic sequence data to answer conclusively questions surrounding modern human origins, it is important to start thinking about how multilocus data can be used to infer demographic history. Unlike selection, which tends to be localized, demography is expected to affect the whole genome. As methods become more sophisticated, we will have the power to consider more complicated (and hopefully more realistic) models of demographic history. The model of ancient structure with recent mixing presented here might be a reasonable approximation of human demographic history. If so, estimating c will give us some idea of what proportion of our genome came from Neanderthals or other archaic human populations. The results obtained here suggest that at least in some cases, tens of thousands of years of random mating are not sufficient to obliterate the patterns of population structure from the distant past.
Acknowledgments
I thank R. Hudson and M. Nordborg for helpful discussions and R. Hudson, M. S. McPeek, M. Przeworski, and two anonymous reviewers for comments on an earlier version of this manuscript. This work was motivated by discussions at the North Atlantic Treaty Organization Advanced Study Institute on Human Evolution at the Newton Institute in Cambridge.
Footnotes

Communicating editor: S. Tavaré
 Received April 18, 1999.
 Accepted October 14, 1999.
 Copyright © 2000 by the Genetics Society of America