| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |


,

,**











,
,

,1
* Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138,
ARL Division of Biotechnology, 

Department of Ecology and Evolutionary Biology and
Department of Anthropology, University of Arizona, Tucson, Arizona 85721,
Department of Biology, Williams College, Williamstown, Massachusetts 01267, ** Santa Fe Institute, Santa Fe, New Mexico 87501, 
Human Genomic Diversity and Disease Research Unit, University of Witwatersand, Johannesburg 2000, South Africa, 
Department of Anthropology, University of Michigan, Ann Arbor, Michigan 48109, 
Department of Animal and Human Biology, University of Rome "La Sapienza," 00185 Rome, Italy, *** Forensic Laboratory for DNA Research, Leiden University, 2300 RC Leiden, The Netherlands, 

Department of Biology, University of Rome "Tor Vergata," 00173 Rome, Italy and 

Department of Anthropology, Temple University, Philadelphia, Pennsylvania 19122
1 Corresponding author: Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721.
E-mail: mfh{at}u.arizona.edu
| ABSTRACT |
|---|
|
|
|---|
40 thousand years ago (KYA). Furthermore, these small Eurasian founding populations appear to have grown much more dramatically than either African or Oceanian populations. Analyses of sub-Saharan African populations provide little evidence for a history of population bottlenecks and suggest that they began diverging from one another upward of 50 KYA. We surmise that ancestral African populations had already been geographically structured prior to the founding of ancestral Eurasian populations. African populations are shown to experience low levels of mitochondrial DNA gene flow, but high levels of Y chromosome gene flow. In particular, Y chromosome gene flow appears to be asymmetric, i.e., from the Bantu-speaking population into other African populations. Conversely, mitochondrial gene flow is more extensive between non-African populations, but appears to be absent between European and Asian populations.
A great deal has already been learned about the demographic history of human populations from genomic sequence data (GARRIGAN and HAMMER 2006). For example, a number of studies of neutral DNA polymorphism uncover evidence for a recent, severe population bottleneck in the history of non-African populations (REICH et al. 2001; MARTH et al. 2004; VOIGHT et al. 2005). Population bottlenecks are known to reduce genetic variability (MARUYAMA and FUERST 1985a,b), while inflating the genetic differences between populations (HEDRICK 1999). Yet the existence of a non-African bottleneck has been inferred only through the analysis of separate population samples, an approach that neglects any effects of human population structure. Neglecting population structure and gene flow between populations may be problematic because recent gene flow can mimic the effects of other demographic events, such as changes in effective pop-ulation size (WAKELEY and ALIACAR 2001). Conversely, a number of other studies draw inferences about population structure without consideration of the effects of population bottlenecks, or other nonequilibrium demographic processes (JORDE et al. 2000; ROMUALDI et al. 2002; CHARLESWORTH et al. 2003).
There are a handful of studies that simultaneously account for both population structure and nonequilibrium demography (WAKELEY et al. 2001; RAY et al. 2003; RAMACHANDRAN et al. 2005). However, the biological inferences gleaned from such analyses are dependent upon the population genetic models assumed by the investigators. Some studies assumed an island model of population structure, in which individual populations are allowed to vary in their sizes and rates of migration (WAKELEY et al. 2001; RAY et al. 2003). Under the island model, populations share common ancestry only via gene flow due to migration. Other studies assume that populations diverge, undergo bottlenecks, and subsequently remain isolated, experiencing no gene flow from neighboring populations (PRUGNOLLE et al. 2005; RAMACHANDRAN et al. 2005). These differing assumptions can lead to alternative conclusions concerning demographic history. For example, the commonly employed statistical summary of population differentiation, FST, has a genomic average value of 0.12 for autosomal single nucleotide polymorphisms between three populations representing Africa, Asia, and Europe (INTERNATIONAL HAPMAP CONSORTIUM 2005). Under the island model, the expected
(WRIGHT 1940), so that the inference is that continental human populations exchange an average of 1.83 migrants per generation. Alternatively, under the pure divergence model, the expected
(NEI 1987). Under this model, assuming the human effective population size is N = 10,000, the inference is that populations diverged approximately t = 51 thousand years ago (KYA) and exchanged no migrants after that time (assuming a 20-year generation interval). To account more thoroughly for biological reality, it is desirable to estimate simultaneously both when populations diverged and how much gene flow occurred thereafter.
The estimation of historical human demographic parameters is also influenced by how DNA sequence polymorphism is measured and analyzed. One example is the choice of loci for inclusion in a study. Often contrasting demographic histories have been inferred for the two sex-specific loci: the mitochondrial DNA (mtDNA) and the nonrecombining Y chromosome (NRY). In many non-African populations, the mtDNA shows the signal of rapid population growth and low levels of differentiation between populations, while the NRY shows a diminished signal of population growth and much higher levels of differentiation (SEIELSTAD et al. 1998; ROGERS et al. 2000; HAMMER et al. 2003; WILDER et al. 2004). Likewise, the signals of population growth also depend upon how populations are sampled. If a small number of individuals from multiple populations are pooled together in an analysis, one may easily confound population growth with population structure (PTAK and PRZEWORSKI 2002; HAMMER et al. 2003). Finally, measuring DNA polymorphism by genotyping single nucleotide polymorphisms (SNPs) results in a known ascertainment bias that is minimized by resequencing entire homologous regions from multiple individuals (NIELSEN 2004).
The isolation-with-migration (IM) model provides a more general framework for making inferences regarding human demographic history (NIELSEN and WAKELEY 2001; HEY and NIELSEN 2004). This two-population model assumes that populations diverge and subsequently experience gene flow. Additionally, the IM model does not require that rates of gene flow be symmetrical between populations and each population is allowed to change size independently (HEY 2005). In this study, we analyze DNA resequencing data from the mtDNA, Y chromosome, and two X-linked introns from large samples of individuals taken from each of 10 anthropologically defined human populations. Although inference under this general model is still not without its caveats, several consistent results emerge, including severe bottlenecks in the history of non-African populations, widely varying local population sizes and rates of growth, and older divergence between African populations than between non-African populations.
| MATERIALS AND METHODS |
|---|
|
|
|---|
|
|
|
Summary statistics and mutation rates:
Features of the sequence alignments for each locus and each population can be described by a battery of summary statistics. Levels of nucleotide polymorphism are summarized by two quantities that are moment estimators of the population mutation rate (
= 4Nµ for an autosomal locus, where µ is the neutral rate of mutation) under the assumptions of the standard neutral model. The unbiased estimator of WATTERSON (1975),
W, is calculated from the number of segregating sites (S) and summarizes the total length of ancestral coalescent genealogies. The estimator of TAJIMA (1983),
, is the average number of pairwise nucleotide differences and summarizes the average coalescence time. The population recombination rate (
= 2Nc for an X-linked locus, where c is the rate of crossing over) was also estimated from the X-linked intron polymorphism data by the method of MCVEAN et al. (2002). Differences in the mean values of
W and
, across all four loci, between African and non-African populations were tested for statistical significance with Hotelling's T2 statistic, which is a multivariate generalization of Student's t statistic. Finally, the polymorphism frequency distribution was summarized with three complementary statistics. The statistic D is based on the normalized difference
W –
(TAJIMA 1989), D* summarizes the number of singletons (FU and LI 1993), and H is the difference between
and an estimator of
weighted by the frequency of derived polymorphisms (FAY and WU 2000). The probabilities of the observed summaries of the polymorphism frequency spectra under the standard neutral model were obtained via coalescent simulation using the program ms (http://home.uchicago.edu/
rhudson1/source/mksamples.html). In each case, the simulated samples had the same number of segregating sites as the actual sample and the rate of crossing over was taken from the estimates of
described above.
Mutation rates for each locus were estimated for the purposes of converting scaled parameter estimates into demographic estimates. For the two haploid loci, the COIII and NRY mutation rates were taken from WILDER et al. (2004). For the two X-linked introns, the net pairwise sequence difference with the chimpanzee outgroup (NEI 1987) was calculated and then divided by twice the assumed human–chimpanzee divergence of six million years to obtain the per year estimate of the neutral nucleotide substitution rate. For all calculations involving quantities measured in units of generations, the human generation time is assumed to be 20 years.
Isolation-with-migration model:
There are seven basic parameters in the two-population IM model: the current effective size of the two populations (NC1 and NC2), the ancestral effective population size (NA), the number of generations since the populations split (T), the proportion of NA that founded population 1 (s), the rate of gene flow into population 1 (M1) and the rate of gene flow into population 2 (M2) (Figure 2). All parameters, besides s, can be scaled by the neutral mutation rate µ by setting (as an autosomal example)
1 = 4NC1µ,
2 = 4NC2µ,
A = 4NAµ, t = Tµ, m1 = M1/µ and m2 = M2/µ. A vector of all parameters that are free to vary is collectively denoted by
= {
1,
2,
A, t, s, m1, m2}. The IM model allows the two descendant populations to change size exponentially over the course of T generations, such that
or
, where
i is the intrinsic rate of exponential growth per generation for population i. Finally, it is important to note that the IM model does not take into account the effects of either natural selection or intragenic recombination.
|
) can be made from multilocus DNA polymorphism data using a Markov chain Monte Carlo (MCMC) technique (NIELSEN and WAKELEY 2001; HEY and NIELSEN 2004). By specifying a prior probability distribution for
, a Bayesian approach can be taken to approximating the posterior probability distribution of parameters, given a nonrecombining DNA polymorphism data set at the ith of l sampled loci (Xi):
![]() |
, c is a constant that ensures the posterior probability sums to unity, and f (
) is the prior probability distribution of parameters
, which is assumed to be uniform along some specified interval. This integral can be evaluated using Monte Carlo simulation of the coalescent process, where f (Gi |
) is the joint density function for both coalescence and migration events (BEERLI and FELSENSTEIN 1999). Under the infinite sites model of mutation, f (Xi |
, Gi) can be calculated by mapping mutations in Xi onto the simulated genealogy, Gi. However, most genealogies sampled from
are expected to contribute little to the overall likelihood, therefore genealogical sampling efficiency is improved with a proposal algorithm for updates in Gi that is similar to the "conditional coalescent" proposal algorithm of BEERLI and FELSENSTEIN (1999).
A Markov chain describing (
, Gi) can be constructed with stationary distribution
. The posterior distribution of
is estimated by sampling from the chain at stationarity (i.e., after an initial "burn-in" period of 105 steps). Updates in the chain are accepted according to a Metropolis–Hastings criterion given by NIELSEN and WAKELEY (2001). Different parameters in
may be updated at different rates, the t parameter updates especially slowly, which may cause the chain to converge to an incorrect stationary distribution. Multiple Metropolis-coupled Markov chains were run simultaneously to improve the mixing of parameters. The swapping of parameters between Metropolis-coupled chains was governed by a two-step scheme, at each step the heating term (β) for the ith chain,
, and we elected to have g1 = 0.05 and g2 = 2. Overall mixing of the unheated chain was assessed both through the observed autocorrelation of parameters in
and through its updated acceptance rates.
The IM program (http://lifesci.rutgers.edu
heylab/HeylabSoftware.htm#IM) was run on all data sets for 10 million steps of a single chain with the following bounded uniform priors:
;
for i = 1, 2;
;
;
. If these ranges did not contain the full marginal posterior probability density, the upper bound was increased incrementally. Once plausible ranges were found, up to eight Metropolis-coupled chains of 10 million steps were run. The number of chains was determined by how well the t parameter mixed in the initial runs. If >15% of proposed updates to t were accepted, a minimum of three chains was always run. Additional runs were performed in which each of the four loci was allowed to experience independent rates of gene flow.
In some cases, the data had to be modified to fit the assumptions of the IM model, specifically that mutations occur according to an infinite-sites model and that there is no intragenic recombination. The four-gamete test was applied to all two-population polymorphism data sets to test compatibility with these two assumptions (HUDSON and KAPLAN 1985). If polymorphic sites in the nonrecombining haploid data sets were found to have pairs with all four gametic types, back mutation was assumed to be responsible and a finite-sites model of mutation was used (HASEGAWA et al. 1985). X-linked sites with all four gametic types were assumed to be the result of recombination events and were subsequently eliminated from the data set in one of two ways. If only a small number of haplotypes were recombinants (i.e., one or two), those haplotypes were excluded from the analysis. If more than two haplotypes were recombinants, the minimum number of incongruent sites was eliminated such that the data fit the infinite-sites model of mutation. The most sites eliminated by this latter criterion were 5 of 17 polymorphic DMD44 sites in the SE Bantu–Dogon comparison. The IM input files are available from the Hammer lab website (http://hammerlab.biosci.arizona.edu/publications/supplementary_data/XPOP_DATA.zip).
Due to the heavy computational burden of analyzing all 45 pairwise combinations of the 10 populations, analysis was carried out on only a subset of possible comparisons. Thirteen pairs of populations were chosen on the basis of geography. Finally, to check the convergence of all chains to the correct stationary distribution, a minimum of three independent replicates was performed and all reported maximum-likelihood parameter estimates represent the mean of these replicate runs of the IM program. The command lines used for IM program are provided in supplemental Table S1 at http://www.genetics.org/supplemental/.
| RESULTS |
|---|
|
|
|---|
Summary statistics describing the polymorphism frequency spectra can provide preliminary insights into which loci are most impacted by deviations from mutation-drift equilibrium (Tables 1 and 2). For example, negative values of Tajima's D statistic at the COIII locus indicate an excess of low frequency polymorphisms over that expected under mutation-drift equilibrium (the standard neutral model). Coalescent simulations of the standard neutral model indicate that this excess is significant in several non-African populations: Mongolians (P = 0.013), Sri Lankans (P = 0.011), Dutch (P = 0.001), and Italians (P < 0.001). The NRY locus shows a significant excess of low frequency polymorphisms only for the Mongolians (P = 0.032), Sri Lankans (P = 0.031), and Papuans (P = 0.019), while the Baining show a significant deficit of low frequency polymorphisms (P = 0.033). The X-linked introns also show a significant deficit of low frequency polymorphisms at APXL in the San (D* = 1.263; P = 0.032), the Papuans (P = 0.008), and in the Baining at DMD44 (P = 0.036). The Papuans also have a significant excess of high frequency derived polymorphism at DMD44 (H = –6.377; P = 0.004). Finally, the neutral mutation rates per locus per year were estimated as 1.23 x 10–5 for COIII, 8.88 x 10–6 for NRY, 6.02 x 10–6 for APXL, and 2.27 x 10–6 for DMD44.
MCMC convergence and diagnostics:
For each two-population data set, from which IM parameters were estimated, four independent replicates of Metropolis-coupled Markov chains were run. Convergence of the unheated Markov chains to their true stationary distributions were verified by examining whether each replicate independently converges to similar parameter values and whether the model parameters within each of the four replicates mixes well. The resulting marginal posterior probability distributions for the thirteen pairwise population comparisons are included as supplemental Figures S1–S13 at http://www.genetics.org/supplemental/. In almost all cases, each of the four replicates yields a posterior distribution with identical modes. The most troubling instances of failure to converge involve
A, the ancestral effective population size parameter. For the SE Bantu–Bakola, SE Bantu–Dogon, and Dutch–Italian comparisons, independent replicates did not result in identical modes of the posterior distributions (supplemental Figures S3, S4, and S13 at http://www.genetics.org/supplemental/).
There are also several instances of diffuse marginal posterior probability distributions. A diffuse posterior distribution is relatively flat and lacks a well-defined mode. In 5 of 13 comparisons, the current effective population size parameter (NC) for non-African populations has diffuse marginal posteriors. Likewise, the splitting time parameters (t) for the Dogon–San and Bakola–San comparisons also have diffuse posterior distributions (Figure 3). Finally, the migration parameters in the Dutch–Italian comparison are also characterized by diffuse posteriors (supplemental Figure S13 at http://www.genetics.org/supplemental/). In each of the above mentioned cases, the parameter estimates were very large relative to those of other data sets. This undesirable property of the posterior distributions is likely to be the result of a lack of information in the data.
|
A, t, and s parameters consistently show both R < 10% and ESS < 50. However, in each instance of poor mixing behavior, the four independent replicates converged to identical modes in their posterior probability distributions.
Effective population sizes:
The resulting marginal posterior distributions for both
1 and
2 indicate a large variance in the current effective population sizes among regional human populations (Table 3). As mentioned in the previous section, the current effective sizes of Eurasian populations and one African population (the SE Bantu) are estimated to be very large and all have relatively diffuse marginal posterior distributions. Many of these data sets also have significantly negative values of Tajima's D and Fu and Li's D* statistics (Tables 1 and 2), which indicate rapid population growth. In these cases, modes upward of 100,000 individuals are unlikely to be reliable. In contrast, multilocus data from African and Oceanian populations yield unimodal posterior distributions with modes greater than 10,000 individuals. Table 3 shows that some individual populations have different estimates of the current effective population size in different comparisons. One example is the Dogon population in the Dogon–SE Bantu vs. Dogon–San comparisons. In the highly recombining SE Bantu data set, the estimate of NC for the Dogon is much smaller than in the San data set, which has less recombination. One putative explanation for this discrepancy stems from different numbers of segregating sites having to be eliminated due to their failure to pass the four-gamete test in different two-population data sets.
|
The founding effective sizes of many non-African populations are estimated to be quite small, compared with those of African populations. The smallest founding effective size is for the Papua New Guineans and is estimated to be only 35 individuals, with a 95% confidence interval of 30–250 individuals (Table 3). In two comparisons with the Asian populations, the Dutch population shows very small founding sizes, between 172 and 275 individuals, while in comparison with the more recently diverged Italian population the founding size of the Dutch is larger. It is interesting that from the Dogon–Mongolian and the SE Bantu–Mongolian comparisons, the founding non-African population size is estimated to be 1500 individuals, with a 95% confidence interval of 46–2000 individuals. Although divergence times among African populations are more ancient than among non-African populations, the founding effective size of the African populations tends to be larger (Table 3).
Population divergence times:
The marginal posterior distributions for t indicate that divergence times between non-African populations all occur no earlier than the Upper Paleolithic, <40 KYA (Figure 3). The minimum divergence time between non-African populations is from the Dutch–Italian comparison, which yields a marginal posterior distribution with a mode at 7 KYA. The deepest non-African divergence times are the Dutch–Mongolian comparison (25 KYA) and the Baining–Papuan comparison (24 KYA). Alternatively, African populations show substantially deeper divergence times, often >50 KYA (Figure 3). Some within-Africa comparisons yield very old divergence times, although exact estimates are difficult to make because of diffuse posterior distributions. However, posterior distributions for within-Africa divergence times all have well-defined lower bounds. The African and non-African comparisons (Dogon–Mongolian and SE Bantu–Mongolian) indicate that ancestral Eurasian populations split from ancestral African populations as recently as 40 KYA, with a 95% confidence interval of 24–68 KYA.
Rates of gene flow:
The estimated rates of locus-specific unidirectional gene flow for the 13 population comparisons are given in Table 4. There are six cases of highly asymmetrical gene flow, for which the estimated effective number of migrants per generation is high (2Nm >> 1) in one direction, but negligible (2Nm << 1) in the other. Between sub-Saharan African populations, asymmetrical rates of gene flow are detected originating from the Dogon into the SE Bantu population for the COIII locus, from the SE Bantu into the Bakola population for the NRY locus, from the San into the SE Bantu for the APXL locus, and in the opposite direction for the X-linked DMD44 locus. The SE Bantu–Dogon comparison is the only one within Africa that shows strong support for gene flow at the mtDNA locus, while all African comparisons show some evidence of gene flow for the NRY locus. Highly asymmetrical rates of gene flow between non-African populations are inferred for the COIII locus originating from the Baining into the Papuan population and for the NRY locus originating from the Papuan into the Mongolian population.
|
At the level of intercontinental comparisons, there is little gene flow inferred between African and non-African populations, but high levels between Eurasia and Oceania. Only the COIII locus in the SE Bantu–Mongolian shows convincing evidence of gene flow between African and non-African populations. This is not the case for the Dogon–Mongolian comparison, in which all loci show little evidence for historical gene flow. In the Mongolian–Papuan comparison, all loci, except DMD44 show some evidence for either asymmetrical or reciprocal gene flow.
| DISCUSSION |
|---|
|
|
|---|
Diversification and growth of human populations:
The earliest diversification events among the sampled populations are estimated to have occurred among extant sub-Saharan African populations; in some cases these events occurred more than 100 KYA. The oldest diversification dates are estimated to be
200 KYA between African hunter-gatherer populations; these times are in close proximity to the estimated time for the emergence of the anatomically modern human phenotype in Africa
195 KYA (MCDOUGALL et al. 2005). These results suggest that extant African hunter-gatherer populations diversified early in anatomically modern human history. Our estimates of African divergence times generally accord with times previously obtained from studies of classical protein markers, microsatellite, Y chromosome, and mtDNA data (CAVALLI-SFORZA et al. 1996; KNIGHT et al. 2003; ZHIVOTOVSKY et al. 2003). The effective population size of this ancestral African population is estimated to be between 5000 and 11,000 breeding individuals. Three of the four sampled African populations show little evidence for population growth since their divergence.
The founding of non-African populations is estimated to have occurred 40 KYA, with a 95% confidence interval of 24–68 KYA. This range encompasses previous estimates of 40–50 KYA from Y chromosome data (SHEN et al. 2000; THOMSON et al. 2000), 52–60 KYA from the mtDNA (WATSON et al. 1997; INGMAN et al. 2000), and 37–57 KYA from autosomal microsatellites (ZHIVOTOVSKY et al. 2003). The results of the IM analysis also indicate a severe population bottleneck(s) at the time of the non-African founding event, in which the founding population(s) is estimated to have been only 1500 breeding individuals. Our inferences concerning non-African bottleneck times are consistent with those from studies of single-population demographic history, which estimate non-African bottleneck times at
40 KYA (VOIGHT et al. 2005), 27–53 KYA (REICH et al. 2001), and 58–112 KYA (MARTH et al. 2004). In the IM model, it is assumed that descendant populations begin exponential growth subsequent to their founding event. This constraint on the inference of bottleneck times in the IM model means that we still cannot definitively address the question of whether human populations began to grow dramatically in the Middle Paleolithic or more recently, during the agricultural revolution of the Neolithic. Moreover, it is not possible to discern the effects of multiple or independent bottlenecks in the history of non-African populations, if they occurred.
Finally, the IM analysis estimates that European and Asian populations were the first non-African populations to diversify, beginning 25 KYA. This divergence time is closely followed by divergence between the two Oceanian populations
24 KYA. Within Asia and Europe, divergence times are estimated to be more recent, ranging from 7 to 13 KYA and were accompanied by high levels of population growth. These estimated non-African population diversification times are slightly more recent than published estimates from autosomal protein and microsatellite data (CAVALLI-SFORZA et al. 1996; ZHIVOTOVSKY et al. 2003).
It is interesting to note that at the time of the founding of the ancestral non-African population, three of the four sampled African populations had already split from one another. The conclusion that non-African populations are derived from a structured ancestral African population raises the possibility that only a subset of African populations contributed genetic material to the founding non-African population. This implication is in accord with the conclusions of SATTA and TAKAHATA (2004), who argue that the high probability of ancestral haplotypes being found in Africa is most compatible with a model of ancestral African population structure, where some African demes did not participate in the out-of-Africa expansion. One potential avenue for future research would be to calculate divergence times between non-African demes and many thoroughly sampled African demes to find the pair with the minimum divergence time.
Patterns of gene flow:
In the IM model, gene flow between two populations can be assumed to occur with independent rates in each direction. Furthermore, rates of gene flow may be estimated separately for each locus included in the analysis. This permits the estimation of sex-specific migration rates gleaned from the uniparentally-inherited NRY and mtDNA. If high gene flow is defined as more than one effective migrant per generation (i.e., Nm > 1), there are a total of 14 cases of inferred asymmetrical high level gene flow (Table 4). Of these 14 cases, 10 involve the sex-specific haploid loci (8 NRY and 2 COIII) and 4 involve the X-linked loci. Additionally, 9 of the 14 cases are between African populations and 5 involve the non-African comparisons. In contrast, 7 of a total of 19 cases of inferred high levels of reciprocal gene flow involve the haploid loci and 12 involve the X-linked loci.
While all within-Africa comparisons show some evidence of NRY gene flow, most cases of asymmetrical NRY gene flow involve the SE Bantu population. In these comparisons, Y chromosome lineages preferentially emigrate out of the SE Bantu population. The continental expansion of the agricultural Bantu-speaking population approximately 3–4 KYA was likely to be an important event shaping patterns of African genetic diversity (WOOD et al. 2005), and may be the cause of these sex-specific gene flow patterns. In the SE Bantu–Bakola comparison, there is support for low levels of gene flow at the COIII locus, but also strong support for Y chromosome gene flow from the SE Bantu into the Bakola, consistent with previous conclusions from Y chromosome studies (DESTRO-BISOL et al. 2004). There is only one within-Africa comparison in which high levels of gene flow are estimated for the COIII locus: the SE Bantu population preferentially receives female-specific lineages from the Dogon.
High levels of female-specific gene flow appear to be more common between non-African populations than between African populations. Only between the continental European and Asian populations does there appear to be a dearth of female-mediated gene flow (Table 4). The analysis further suggests high levels of gene flow at three of the four sampled loci between continental Asian and Oceanian populations in agreement with the pattern described by LUM et al. (1998), who found low levels of differentiation between Papuans and continental east Asian populations for mtDNA, but high differentiation for autosomal microsatellites. Interestingly, similar to the findings of WILDER and HAMMER (2007), gene flow appears to be relatively low between the two Oceanian populations.
One particularly complex pattern of gene flow involves the Sri Lankan population. For three of the four resequencing loci, Sri Lankans are estimated to receive extremely high levels of gene flow from the Mongolian population, while negligible levels are detected in either direction at the APXL locus. Interestingly, when Sri Lankans are analyzed in conjunction with the Dutch sample, there is strong evidence for reciprocal gene flow between the two populations at the APXL locus. Other studies of autosomal markers have noted genetic affinities between Indian and European populations (BAMSHAD et al. 2001; WATKINS et al. 2003). This appears to be a case where the Sri Lankan genome is a mosaic resulting from gene flow from different regional populations.
Predictions of the IM model:
The contrast between patterns of polymorphism seen at loci with different effective population sizes is often informative for inferring historical demographic events, such as population bottlenecks (FAY and WU 1999). The inferences presented in this article are gleaned from three different compartments of the human genome, each with a different effective population size. However, given the wealth of human autosomal polymorphism data available, it is important to ask: How well do our inferences under the IM model (limited to these four loci) predict patterns of polymorphism seen in larger genomic data sets? To address this question, we simulated under the IM model with parameter values resulting from our analysis.
Because large-scale surveys of resequenced autosomal noncoding polymorphism are currently available for only a single African, Asian, and European population (Hausa, Han Chinese, and Italian; VOIGHT et al. 2005), we substituted the Dogon–Mongolian and Dutch–Mongolian parameter estimates into our simulations for the purpose of comparison. We record the genomic sampling distribution of two summary statistics, FST and Tajima's D, from 10,000 simulated replicates to assess how well they agree with the observed genomic distribution estimated from 50 resequence loci. The ms program was used for coalescent simulations (command lines can be found in supplemental Table S3 at http://www.genetics.org/supplemental/). Figure 4 shows the resulting genomic distributions of the two statistics under our inferred parameterization of the IM model.
|
The inferences under the IM model made here predict a larger degree of genetic divergence between African and non-African populations than is observed. Conversely, the IM model predicts too little genetic differentiation between European and Asian population, compared to the observed distribution. This may be an artifact of obtaining the observed and expected distributions from different populations, or it may reflect the uncertainty in the IM parameter estimates. Unlike, the genetic differentiation statistics, the IM model accurately predicts the observed frequency spectrum statistic. Taken together, these results suggest that the estimates of population size and growth under the IM model may be more reliable than the estimates of divergence time and migration rates.
The problem of unsampled populations:
The presence of unsampled populations embedded in a network of populations can be a significant problem for inferring the true pattern of gene flow (BEERLI 2004; SLATKIN 2005). Unsampled populations can be particularly troublesome in cases of inferred asymmetrical gene flow. One can envision a simple three population model in which populations 1 and 3 are sampled, but intermediary population 2 remains unsampled (Figure 5A). Now, assume that populations 2 and 3 are recently diverged from one another and do not exchange genes, but ongoing gene flow does occur between the more distantly related populations 1 and 2. From this model, there would appear to be asymmetrical gene flow from population 3 into population 1. This "apparent" pattern of gene flow is a byproduct of the fact that unsampled genes in population 2 (that are closely related to genes from population 3) move into population 1, but genes coming from population 1 into population 2 are not sampled and do not migrate into population 3. In this case, the integrity of the biological inference of asymmetrical gene flow between populations 1 and 3 is compromised because, in fact, there is no asymmetrical gene flow occurring at all.
|
Conclusions:
While no model can accurately capture all of the processes that affect biological populations, the IM model represents an improvement over traditional models that rely heavily upon the assumptions that populations either do not have shared history apart from gene flow (e.g., Wright's island model), or that populations diverge in isolation. Such generality comes at the expense of the model being parameter rich and thus inference with MCMC techniques can be computationally prohibitive and challenging. Fitting the IM model to this modest four-locus data set confirms and refines previous inferences of a strong non-African population bottleneck(s) and much deeper divergence times between African than non-African populations. However, the IM analysis also suggests a locally dynamic pattern of gene flow and highlights the effect of the expansion of a single population (as exemplified by the Bantu-speaking population) on patterns of within-continent genetic differentiation. Similarly, the IM results suggest a complex relationship between extant sub-Saharan African and non-African populations, in which non-African populations may have descended from only a subset of African populations. Finally, the inference of recurrent population diversification and bottlenecking cautions that explaining human genomic polymorphism may not be achievable through the use of simple, equilibrium models.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
| LITERATURE CITED |
|---|
|
|
|---|
BAMSHAD, M., T. KIVISILD, W. S. WATKINS, M. E. DIXON, C. E. RICKER et al., 2001 Genetic evidence on the origins of Indian caste populations. Genome Res. 11: 994–1004.
BEERLI, P., 2004 Effect of unsampled populations on the estimation of population sizes and migration rates between sampled populations. Mol. Ecol. 13: 827–836.[CrossRef][Medline]
BEERLI, P., and J. FELSENSTEIN, 1999 Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. Genetics 152: 763–773.
CAVALLI-SFORZA, L. L., P. MENOZZI and A. PIAZZA, 1996 The History and Geography of Human Genes. Princeton University Press, Princeton, NJ.
CHARLESWORTH, B., D. CHARLESWORTH and N. H. BARTON, 2003 The effects of genetic and geographic structure on neutral variation. Annu. Rev. Ecol. Syst. 34: 99–125.[CrossRef]
DESTRO-BISOL, G., F. DONATI, V. COIA, I. BOSCHI, F. VERGINELLI et al., 2004 Variation of female and male lineages in sub-Saharan populations: the importance of sociocultural factors. Mol. Biol. Evol. 21: 1673–1682.
FAY, J. C., and C. I. WU, 2000 Hitchhiking under positive Darwinian selection. Genetics 155: 1405–1413.
FAY, J. C., and C. I. WU, 1999 A human population bottleneck can account for the discordance between patterns of mitochondrial versus nuclear DNA variation. Mol. Biol. Evol. 16: 1003–1005.[Medline]
FU, Y. X., and W. H. LI, 1993 Statistical tests of neutrality of mutations. Genetics 133: 693–709.[Abstract]
GARRIGAN, D., and M. F. HAMMER, 2006 Reconstructing human origins in the genomics era. Nat. Rev. Genet. 7: 669–680.[CrossRef][Medline]
HAMMER, M. F., F. BLACKMER, D. GARRIGAN, M. W. NACHMAN and J. A. WILDER, 2003 Human population structure and its effects on sampling Y chromosome sequence variation. Genetics 164: 1495–1509.
HASEGAWA, M., H. KISHINO and T. YANO, 1985 Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22: 160–174.[CrossRef][Medline]
HEDRICK, P. W., 1999 Perspective: highly variable loci and their interpretation in evolution and conservation. Evolution 53: 313–318.[CrossRef]
HEY, J., 2005 On the number of New World founders: a population genetic portrait of the peopling of the Americas. PLoS Biol. 3: e193.[CrossRef][Medline]
HEY, J., and R. NIELSEN, 2004 Multilocus methods for estimating population sizes, migration rates, and divergence time, with applications to the divergence of Drosophila pseudoobscura and D. persimilis. Genetics 167: 747–760.
HUDSON, R. R., and N. L. KAPLAN, 1985 Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111: 147–164.
INGMAN, M., H. KAESSMANN, S. PAABO and U. GYLLENSTEN, 2000 Mitochondrial genome variation and the origin of modern humans. Nature 408: 708–713.[CrossRef][Medline]
INTERNATIONAL HAPMAP CONSORTIUM, 2005 A haplotype map of the human genome. Nature 437: 1299–1320.[CrossRef][Medline]
JORDE, L. B., W. S. WATKINS, M. J. BAMSHAD, M. E. DIXON, C. E. RICKER et al., 2000 The distribution of human genetic diversity: a comparison of mitochondrial, autosomal, and Y-chromosome data. Am. J. Hum. Genet. 66: 979–988.[CrossRef][Medline]
KNIGHT, A., P. A. UNDERHILL, H. M. MORTENSEN, L. A. ZHIVOTOVSKY, A. A. LIN et al., 2003 African Y chromosome and mtDNA divergence provides insight into the history of click languages. Curr. Biol. 13: 464–473.[CrossRef][Medline]
LUM, J. K., R. L. CANN, J. J. MARTINSON and L. B. JORDE, 1998 Mitochondrial and nuclear genetic relationships among Pacific Island and Asian populations. Am. J. Hum. Genet. 63: 613–624.[CrossRef][Medline]
MARTH, G. T., E. CZABARKA, J. MURVAI and S. T. SHERRY, 2004 The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations. Genetics 166: 351–372.
MARUYAMA, T., and P. A. FUERST, 1985a Population bottlenecks and nonequilibrium models in population genetics. II. Number of alleles in a small population that was formed by a recent bottleneck. Genetics 111: 675–689.
MARUYAMA, T., and P. A. FUERST, 1985b Population bottlenecks and nonequilibrium models in population genetics. III. Genic homozygosity in populations which experience periodic bottlenecks. Genetics 111: 691–703.
MCDOUGALL, I., F. H. BROWN and J. G. FLEAGLE, 2005 Stratigraphic placement and age of modern humans from Kibish, Ethiopia. Nature 433: 733–736.[CrossRef][Medline]
MCVEAN, G., P. AWADALLA and P. FEARNHEAD, 2002 A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics 160: 1231–1241.
NEI, M., 1987 Molecular Evolutionary Genetics. Columbia University Press, New York.
NIELSEN, R., 2004 Population genetic analysis of ascertained SNP data. Hum. Genomics 1: 218–224.[Medline]
NIELSEN, R., and J. WAKELEY, 2001 Distinguishing migration from isolation: a Markov chain Monte Carlo approach. Genetics 158: 885–896.
PRUGNOLLE, F., A. MANICA and F. BALLOUX, 2005 Geography predicts neutral genetic diversity of human populations. Curr. Biol. 15: R159–R160.[CrossRef][Medline]
PTAK, S. E., and M. PRZEWORSKI, 2002 Evidence for population growth in humans is confounded by fine-scale population structure. Trends Genet. 18: 559–563.[CrossRef][Medline]
RAMACHANDRAN, S., O. DESHPANDE, C. C. ROSEMAN, N. A. ROSENBERG, M. W. FELDMAN et al., 2005 Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. Proc. Natl. Acad. Sci. USA 102: 15942–15947.
RAY, N., M. CURRAT and L. EXCOFFIER, 2003 Intra-deme molecular diversity in spatially expanding populations. Mol. Biol. Evol. 20: 76–86.
REICH, D. E., M. CARGILL, S. BOLK, J. IRELAND, P. C. SABETI et al., 2001 Linkage disequilibrium in the human genome. Nature 411: 199–204.[CrossRef][Medline]
ROGERS, E. J., A. C. SHONE, S. ALONSO, C. A. MAY and J. A. ARMOUR, 2000 Integrated analysis of sequence evolution and population history using hypervariable compound haplotypes. Hum. Mol. Genet. 9: 2675–2681.
ROMUALDI, C., D. BALDING, I. S. NASIDZE, G. RISCH, M. ROBICHAUX et al., 2002 Patterns of human diversity, within and among continents, inferred from biallelic DNA polymorphisms. Genome Res. 12: 602–612.
SATTA, Y., and N. TAKAHATA, 2004 The distribution of the ancestral haplotype in finite stepping-stone models with population expansion. Mol. Ecol. 13: 877–886.[CrossRef][Medline]
SEIELSTAD, M. T., E. MINCH, and L. L. CAVALLI-SFORZA, 1998 Genetic evidence for a higher female migration rate in humans. Nat. Genet. 20: 278–280.[CrossRef][Medline]
SHEN, P., F. WANG, P. A. UNDERHILL, C. FRANCO, W. H. YANG et al., 2000 Population genetic implications from sequence variation in four Y chromosome genes. Proc. Natl. Acad. Sci. USA 97: 7354–7359.
SLATKIN, M., 2005 Seeing ghosts: the effect of unsampled populations on migration rates estimated for sampled populations. Mol. Ecol. 14: 67–73.[CrossRef][Medline]
TAJIMA, F., 1983 Evolutionary relationship of DNA sequences in finite populations. Genetics 105: 437–460.
TAJIMA, F., 1989 Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123: 585–595.
TESHIMA, K. M., G. COOP and M. PRZEWORSKI, 2006 How reliable are empirical genomic scans for selective sweeps? Genome Res. 16: 702–712.
THOMSON, R., J. K. PRITCHARD, P. SHEN, P. J. OEFNER and M. W. FELDMAN, 2000 Recent common ancestry of human Y chromosomes: evidence from DNA sequence data. Proc. Natl. Acad. Sci. USA 97: 7360–7365.
VOIGHT, B. F., A. M. ADAMS, L. A. FRISSE, Y. QIAN, R. R. HUDSON et al., 2005 Interrogating multiple aspects of variation in a full resequencing data set to infer human population size changes. Proc. Natl. Acad. Sci. USA 102: 18508–18513.
WAKELEY, J., and N. ALIACAR, 2001 Gene genealogies in a metapopulation. Genetics 159: 893–905.
WAKELEY, J., R. NIELSEN, S. N. LIU-CORDERO and K. ARDLIE, 2001 The discovery of single-nucleotide polymorphisms and inferences about human demographic history. Am. J. Hum. Genet. 69: 1332–1347.[CrossRef][Medline]
WATKINS, W. S., A. R. ROGERS, C.