Multilocus Methods for Estimating Population Sizes, Migration Rates and Divergence Time, With Applications to the Divergence of Drosophila pseudoobscura and D. persimilis
Jody Hey, Rasmus Nielsen


The genetic study of diverging, closely related populations is required for basic questions on demography and speciation, as well as for biodiversity and conservation research. However, it is often unclear whether divergence is due simply to separation or whether populations have also experienced gene flow. These questions can be addressed with a full model of population separation with gene flow, by applying a Markov chain Monte Carlo method for estimating the posterior probability distribution of model parameters. We have generalized this method and made it applicable to data from multiple unlinked loci. These loci can vary in their modes of inheritance, and inheritance scalars can be implemented either as constants or as parameters to be estimated. By treating inheritance scalars as parameters it is also possible to address variation among loci in the impact via linkage of recurrent selective sweeps or background selection. These methods are applied to a large multilocus data set from Drosophila pseudoobscura and D. persimilis. The species are estimated to have diverged ∼500,000 years ago. Several loci have nonzero estimates of gene flow since the initial separation of the species, with considerable variation in gene flow estimates among loci, in both directions between the species.

FOR many species, existence is a complicated mix of multiple populations that are dynamic in size, location, and levels of gene exchange. Sometimes an individual population diverges from others to a sufficient degree that evolution thereafter proceeds largely independently. What is it about these populations that go on to become new species that sets them apart from others that do not? Clearly the accumulation of endemic mutations under an extended period of allopatry can enable this process (Dobzhansky 1936; Muller 1940; Mayr 1942). But if populations are not completely separated, then divergence entails a competition between unifying and diversifying genetic processes. Genetic drift will enhance divergence, while gene flow will retard it. Natural selection, enabled by gene flow, can act to reduce divergence if selective sweeps pass across populations. But on the other hand, natural selection that leads to population-specific selective sweeps or that acts otherwise to retard gene exchange can promote divergence.

One classic finding on gene flow and genetic drift that helps to focus our intuition is that a modest level of gene flow (one gene copy per generation, on average) will prevent substantial divergence at a locus (Wright 1931). This point also begets a key corollary: that only a modest level of natural selection against gene flow may be sufficient to enable divergence. But despite these guidelines, empirical questions on population divergence can be fairly intractable. Perhaps the clearest case of this is the common situation when a measure of differentiation has been obtained for a pair of populations, such as a genetic distance or an estimate of Wright's Fst (Wright 1922). Given such a number, one can then estimate how long ago the populations diverged (assuming no gene flow), or one can estimate the gene flow rate, assuming the populations are at equilibrium and have been separated and are exchanging genes at that rate for a very long period of time. In short, one can fit a model that assumes the populations will become increasingly divergent (model I, for isolation), or one can fit a model that assumes the populations will never diverge more than they have already, because of gene flow (model M, for migration). Neither one is of much use if the goal is to develop a full picture that includes estimates of separation time and gene flow.

Investigators have considered nonequilibrium models of population splitting; with gene flow, however, there are significant challenges (Latter 1973; Takahata and Slatkin 1990; Takahata 1995; Wakeley 1996b; Wakeley and Hey 1998). The problem is that the two different models (I and M) can lead to similar gene tree topologies and can be fit equally well to most kinds of data summaries (Slatkin and Maddison 1989; Takahata and Slatkin 1990). However, the two models do not have identical predictions and data summaries and likelihood methods that exploit these differences can be used to distinguish them (Wakeley 1996a; Nielsen and Slatkin 2000; Nielsen and Wakeley 2001).

One fairly complete approach is to include both isolation and migration [the “isolation with migration” (IM) model] and to apply a probabilistic method for fitting the model to a data set. Nielsen and Wakeley (2001) developed a likelihood/Bayesian framework for fitting a six-parameter version of the IM model to data from a single, nonrecombining locus drawn from two populations or closely related species. At the heart of the method is an expression for the distribution of model parameters Θ, given the data, X: Math1

Here, c = 1/Pr(X), the inverse of the probability of the data. In the course of calculations c is treated as a constant to ensure that the total probability for all values of Θ sums to one. f(Θ) is the prior probability density function of the parameters and f(G|Θ) is the prior distribution of genealogies (rooted ultrametric trees with branch lengths). In this framework, inferences regarding Θ are based on the posterior distribution of Θ, f(Θ|X).

The most challenging aspect of (1) hinges on the unknown genealogy (i.e., the gene tree) that underlies the data. For any particular genealogy, G, and set of parameters, Θ, it is possible to evaluate f(G|Θ) using coalescent theory (Kingman 1982a,b; Hudson 1983; Tavare 1984). Also for a given mutation model it is possible to calculate the probability of the data, for a given genealogy and set of parameters values, f(X|G, Θ). The genealogy for each locus consists of a tree with a branching pattern (topology), in which all of the DNA sequences in the data set for that locus are represented at the tips. Each genealogy also includes values for all of the branch lengths, as well as times of migration events. In effect G is a nuisance parameter that must be integrated out to gain insight into the demographic parameters. Nielsen and Wakeley (2001) implemented a Markov chain Monte Carlo (MCMC) approach that jointly approximates the integration over genealogies in the course of also approximating the full expression for f(Θ|X).

In principle the prior distribution of Θ can be set to reflect actual prior information regarding Θ; however, for most purposes f(Θ) is set to a constant value over a prescribed range of values. By setting this prior distribution to be uniform, f(Θ|X) is proportional to the likelihood of the parameters, given the data. Thus, for example, if (1) can be evaluated, then the mode of the posterior distribution provides a maximum-likelihood estimate of Θ.

The method of Nielsen and Wakeley (2001) was designed for data from just a single nonrecombining locus, and it can be slow to converge on the correct posterior densities for some data sets. Here we present some extensions of the method that alleviate these problems and greatly increase the applicability of the method. We demonstrate the methods using previously published data from Drosophila pseudoobscura and D. persimilis.


Consider a general IM model in which a population gives rise to two populations, after which there may be gene exchange between the two populations. This model has six major parameters: the population sizes of the three populations (for populations 1 and 2 and the ancestral population), two migration rates, and a time point at which the ancestral population gave rise to populations 1 and 2. In Figure 1 , two versions of these parameter sets are shown. One includes the basic demographic parameters of population size, migration rate, and time of population splitting. In this framework, the genetic process of drift and mutation occurs on a timescale of generations. However, when we fit the model to genetic data we usually do not have direct access to a timescale that is in units of generations or years. For this reason the method (and others like it) must scale the parameters either by the rate of genetic drift or by the mutation rate.

Figure 1.—

The isolation with migration (IM) model is depicted with two parameters sets. The basic demographic parameters are constant effective population sizes (N1, N2, and NA), gene flow rates per gene copy per generation (m1 and m2), and the time of population splitting at t generations in the past. The second set of parameters is scaled by the neutral mutation rate u, and it is these parameters that are actually used in the model fitting.

In the present framework, each population is represented by a population mutation rate, θ = 4Nu, where N is the effective size of a population of diploid individuals, and u is the neutral mutation rate, rather than by population size. For migration, the rates are expressed as the rate of migration for each gene copy, per mutation event, or m = m/u. In this way, the product of a population mutation rate and a migration rate is equal to the more familiar population migration rate M = 2Nm = θm/2. The time parameter also is expressed in terms of mutations, t = tu. If we wish, we can convert to a measure of time that is on the same scale as the process of genetic drift, T = t/2N = 2t/θ. Thus also shown in Figure 1 are parameters scaled to the neutral mutation rate, u. This parameterization differs from the original description of the method, which used θ1 = 4N1u, r = N2/N1, a = NA/N1, M1 = 2N1m1, M2 = 2N2m2, and T = t/2N1 (Nielsen and Wakeley 2001). Throughout the article, quantities that are in units of individuals (N) or generations (t) or that are rates per generation (u and m) are in boldface type, whereas parameters that are used in the method, including demographic parameters that are scaled by the mutation rate, are expressed in italics (e.g., m and t).

Multiple loci:

A key assumption of the method is that the locus being studied has been evolving neutrally and that it has been drawn at random from all loci, with respect to genealogical history. In other words, the locus should not have been drawn in such a way that it is likely to have an atypical gene tree depth or to have experienced an atypical amount of gene flow. But even if these assumptions apply, different unlinked loci will vary widely in their histories. This normal stochastic variance among loci can be very great, and it is a major difficulty for phylogeographic studies that use just one locus (Hey and Machado 2003). In principle this can be overcome, and parameter estimates greatly improved, by extending the method to simultaneously include multiple loci.

The extension of the method to multiple independent (i.e., effectively unlinked) loci is fairly straightforward, as the joint density function for the parameters can be expressed as a function of the product of the densities calculated for each individual locus. Similar to expression (1), the joint posterior density can be written as Math2where Θ still refers to the vector of parameters of the model, Xi refers to the data for locus i, and Gi is the genealogy for locus i. As in the single-locus case, Gi is described by the topology of an ultrametric tree, its associated coalescence times, and the times of migrations on each branch of the tree. For notational convenience, from now on we use the notation G = (G1, G2, …) and X = (X1, X2, …).

Expression (2) cannot be solved analytically. However, it is possible to estimate the posterior probability density by simulating a Markov chain using the Metropolis-Hastings algorithm (see, e.g., Gilks et al. 1996). The basic idea in this method is to simulate a Markov chain with state space on the set of all possible values of G and Θ and stationary distribution f(G, Θ|X). Then f(Θ|X) can be approximated by sampling values of Θ from this chain at stationarity. The simulation initiates with a starting set of parameter values and a starting set of genealogies that are consistent with the data and proceeds by iteratively updating each of the genealogies and the parameter values in turn according to the appropriate Metropolis-Hastings criterion that ensures that the Markov chain has the desired stationary distribution. Over the course of a long simulation a record is kept of the time that the chain spends at each of the possible values for each parameter. After a sufficiently long run, the distribution of residence times for a given parameter should be a good approximation of the marginal posterior density of that parameter.

Under this multilocus framework the general expression for the Metropolis-Hastings criterion seems fairly complex, with an update, from parameter values Θ and genealogies G1, G2 . . Gn to Θ* and G1*, G2*, . . Gn*,

accepted with probability Math3

[see expression (3) of Nielsen and Wakeley 2001]. However, for most parameters the quantity simplifies considerably, and in all cases where the parameter being updated affects the probability associated with more than one locus, the criterion is a product of terms given in Nielsen and Wakeley (2001).

To include multiple loci it is necessary to extend the parameter set. But since under the model all of the basic demographic parameters apply to all loci, the only additional parameters necessary are locus-specific mutation scalars. Thus for locus i, we let ui represent the relative mutation rate for locus i such that the population mutation rate at that locus is θui. One way to implement such scalars is to pick one locus as a standard with a mutation rate scalar of 1 and to have the scalars for other loci vary as parameters to be estimated. However, this would cause parameter estimates to vary depending on the locus selected as the standard, and it might also be the case that convergence would be slow because of the strong correlation among parameters in the proposal kernel. An alternative that we have chosen is to let all loci have scalars that are free to vary, subject to the constraint that their products are equal to 1. For example, when there are two loci, there will be two mutation rate scalars, u1 and u2, that vary reciprocally so that at all times during the Markov chain simulation u1 = 1/u2 and vice versa. We use a log uniform prior on the inheritance scalars subject to the constraint Math At the beginning of a Markov chain simulation with n loci, all n scalars are set to 1. Updates to these scalars are considered in turn along with updates to the other parameters. For each update, two loci, i and j, are selected at random from the n loci and their scalars, ui and uj, are replaced by Math and Math, respectively. d is drawn at random from a uniform log scale distribution such that ui* and uj* fall between 1/x and x (x is a very large number). If we envision there being n different population mutation rates for population 1, θ1,1, … , θ1,n with θ1,i = θ1ui, then the value of θ1 is equal to the geometric mean of the individual locus values and remains constant before and after an update of the scalars. Because both the ratios of update densities and the ratios of the priors are proportional to Math, the Metropolis-Hastings acceptance probability is simply Math4

A benefit of mutation scalars implemented in this way is that they can be easily applied regardless of the mutation model and regardless of the length of individual loci. Thus, in general a locus represented by long DNA sequences will reveal more polymorphisms and greater divergence and correspondingly high values for the mutation scalar relative to a short locus that is included in the same analysis. If just two loci of identical underlying mutation rates per base pair, but of different sequence lengths, are included in an analysis, then we would expect the estimate of the mutation rate scalar of the longer locus to be greater than one and to be close to the reciprocal of the estimate of the shorter locus. In most applications described below, the infinite-sites model is used, but regardless of the mutation model the scalars can be applied to a mixture of loci of various mutation models, including the Hasegawa-Kishino-Yano (HKY) model (Hasegawa et al. 1985; Palsbøll et al. 2004) as well as the stepwise mutation model (Hey et al. 2004).

Inheritance scalars as constants:

When multiple genes are studied it is often the case that different loci have different modes of inheritance (e.g., autosomal, hemizygous, or sex limited). The issue raised by these cases is that the effective population size of nonautosomal loci is correspondingly reduced by their lower effective level of ploidy and number of carriers in the population. Thus it is common to multiply estimates of θ for a hemizygous locus by a factor of 4/3, and a sex limited locus by a factor of 4, to bring them up to par with estimates for autosomal loci. These types of adjustments can be readily included within the model. For locus i with an expected effective number of gene copies of hi, relative to an autosomal locus, the population mutation rate parameter during the Markov chain is set to θhi. Thus at all points in the Markov chain involving calculations for locus i, and using θ1, θ2, and θA, the products of these parameters and hi are used instead. The effect of this is to compress or stretch the distribution of θ by a factor of 1/hi. For example, if an estimated value is obtained for a single locus assuming h = 1, then the same data using an inheritance scalar of h = 1/4 will return an estimated population mutation rate of four times that value.

Inheritance scalars as parameters:

A difficulty with inheritance scalars is that their values are generally taken to be known directly as a consequence of the mode of inheritance. The usual assumption is that males and females each contribute equally to the effective population size of the population. An alternative approach to asserting a particular value for h is to allow the inheritance scalar to be a parameter in the model. Under such a framework, the value of h for each locus would be free to vary during the course of the Markov chain as a function of the data and the model, and posterior distributions would be returned for h1, h2, … hn just as for the other parameters. There are two sorts of biological justifications for this. First, the assumptions regarding sex ratios and effective numbers of males and females that underlie the conventional values of these scalars may not hold. Second, and perhaps more importantly, there are other, selective reasons why the effective number of gene copies experienced by different loci may vary systematically among populations—over and above that variation caused by the mode of inheritance. Recurrent selective sweeps (Maynard Smith and Haigh 1974; Gillespie 2000) or background selection (Charlesworth et al. 1993) can cause a locus to steadily experience a reduced effective population size that is different from that of other loci.

When implementing inheritance scalars as parameters we are faced with a situation similar to that for the locus-specific mutation scalars. As in that case, we have used an updating scheme in which pairs of inheritance scalars are changed in such a way that the product of all inheritance scalars is 1. The Metropolis-Hastings criterion for updating the inheritance scalars for locus i and j, from hi and hj to dhi and hj/d, is Math5

For equilibrium models, in which population sizes and migration rates are constant over an effectively infinite period of time, all branches on the genealogy will scale with both effective population size and mutation rate. In this situation, modifiers of effective population size (i.e., inheritance scalars) and mutation rate (i.e., mutation rate scalars) cannot be independently estimated (i.e., the model is not identifiable when both inheritance and mutation scalars are free to vary as parameters). Thus, for example, with data from a single population and multiple loci that vary in polymorphism levels, one could estimate a set of mutation scalars (i.e., one for each locus) or a set of inheritance parameters, but the two sets would be identical and one could not estimate both. However, when a model includes population splitting, the two sets of parameters become separable. Polymorphism within populations depends upon inheritance mode and mutation rate, while the divergence between populations depends directly upon mutation rate, but not directly upon the mode of inheritance. This is why some models of population structure must consider the mode of inheritance separately from mutation rate (Wang and Caballero 1999; Laporte and Charlesworth 2002) and why multilocus models of population splitting must include both types of scalars (Wang et al. 1997).

Because the two mutation and inheritance scalars both apply to the population mutation rates (θ1, θ2, and θA) the two types of parameters are expected to negatively covary to a large extent, particularly if the time of population splitting has been recent relative to the depth of gene trees within populations. Similarly gene flow between populations blurs the demarcation between variation that arises between populations, due to population splitting, and variation that arises within populations. It is likely that data sets drawn from populations that have had either very recent separation from an ancestral population or substantial gene flow since separation will not have the divergence necessary to support the estimation of both mutation rate scalars and inheritance scalars.

Metropolis coupling:

A major difficulty of MCMC estimation of probability densities is not knowing the length of time needed for convergence (i.e., for the simulated values to accurately approximate the true density). The method offers no guarantee that the chain will sufficiently traverse the state space in reasonable time, and there has been some debate on whether an investigator should run multiple chains and on how long chains should be to have some confidence that the results have converged on the correct answer (Gelman and Rubin 1992a,b; Geyer 1992a,b). Usually with single-locus data sets of total sample size <50, chains of 20 million steps prove sufficient for repeatable, albeit rough, point estimates, although considerably longer chains are needed for highly precise estimates. The same cannot be said of data sets with multiple loci, for which convergence may often be very slow. The fundamental problem is that for many loci, f(Θ|G, X) tends to be centered on a specific value of Θ (i.e., the posterior density of the parameters is well determined, conditional on G). However, the unconditional posterior density, f(Θ|X), may nonetheless have a large variance. This seems especially to be an issue for the parameter t and leads to reduced mixing and slow convergence of the chain.

To offset this difficulty we have implemented a Metropolis-coupled version of the algorithm in which multiple chains are run simultaneously, with all chains but one having heated stationary distributions (Geyer 1991). These heated chains will not individually return the correct posterior distributions but they will explore the parameter space far more quickly than will the nonheated chain. Increased mixing in the nonheated chain is obtained by symmetrically swapping parameter and genealogy states between chains at rates determined by a Metropolis criterion that is a function of the difference in overall probabilities between the chains and the difference in heating values of the chains (Geyer 1991). For a simulation with Metropolis coupling among k chains, each chain will be approximately a fraction 1/k as long as a single chain run for the same length of time. The advantage gained is that the overall rate of mixing on the primary chain may be vastly improved. In practice we have found this method solves the difficulties of inadequate mixing that arise sometimes with data sets that include multiple loci.

Mutation models:

In the original description, the method was limited to the infinite-sites mutation model (Kimura 1969). Recently it has been extended to the HKY model (Hasegawa et al. 1985; Palsbøll et al. 2004). This means that it can be used for loci, like the mtDNA, that do not have recombination, but generally do show evidence of homoplasy. The method has also been extended to include the stepwise mutation model (Hey et al. 2004).

Computer program development:

A computer program (available from the authors) was written to implement the method with the enhancements. In addition to basic debugging, we employed three types of checking: ensuring that the posterior distributions are identical to the prior distributions when f(X|G, Θ) is set to 1, for all G and Θ; comparing results with simpler models for which posterior densities can be calculated directly or that can be assessed using other programs (Nielsen 1997; Wilson and Balding 1998; Nielsen and Wakeley 2001); and applying the method to data sets simulated under the IM model. This last method is the most complete but it is laborious because there is not a necessary relationship between the parameters used to simulate a data set and the posterior densities that are estimated from that simulated data. Rather, multiple simulated data sets need to be analyzed so that a set of posterior densities can be assessed in relation to the true parameter values used for the simulations. Figure 2 shows the marginal posterior densities estimated from each of 20 independent five-locus simulations. For each of the six demographic parameters, the posterior densities vary about the true value used in the simulation. To test whether the locations of these distributions, considered together, are consistent with the true values of the parameters (i.e., the values used in the simulations), we used Fisher's approach to combining probabilities from independent tests of the same hypothesis (Fisher 1954). For each posterior density we determined pi, i = 1, … , 20, the chance that a parameter value is more extreme (i.e., departs more from the mean of the distribution) than the actual true value. That is, if x is the area of the curve to the left of the true value then pi = 2x if x < 0.5 and pi = 2(1 − x) if x > 0.5. If the pi's are uniformly distributed, then Math is χ2 distributed with 40 d.f. (i.e., two times the number of densities). In this distribution 90% of the probability mass falls above 29.05, 50% falls above 39.3, and 10% falls above 51.8 (Rohlf and Sokal 1981). If the values of the true parameters in the coalescent simulations are consistent with the shape and locations of the posterior densities (i.e., pi is uniformly distributed), then a for each parameter should be drawn from this χ2 distribution. The values of a for the different parameters are θ1, 45.5; θ2, 48.2; θA, 29.5; m1, 43.1; m2, 36.8; and t, 41.5. Although these values are not entirely independent of each other, they all fall in the middle of the χ2 distribution, and their mean (40.8) is quite close to the 50% point of the χ2 distribution (39.3). While results from only a limited number of simulations have been shown here, they do suggest that the method is reliable and that the credibility intervals established by the method may be interpreted as classical confidence intervals.

Figure 2.—

The marginal densities obtained by fitting the IM model to simulated data. The input parameters for the simulations were as follows: θ1 = 20; θ2 = 40; θA = 10; m1 = 0.05 (2N1m1 = 0.5); m2 = 0.1 (2N2m2 = 2); and t = 5 (t/2N1 = 0.5). For each simulated data set, coalescent simulations were done for each of five loci with identical mutation rates under an infinite-sites mutation model, each with sample sizes of 10 for each of the two populations. Each simulated data set was analyzed using wide uniform prior distributions for each parameter and four chains (three heated chains, in addition to the primary chains) joined by Metropolis coupling. Each analysis began with a burn-in period of 300,000 steps followed by a primary chain of 6,000,000 steps. The curves for parameters θ1 through t are shown in A–F, respectively. The true parameter values used in the simulations are shown as shaded vertical bars.

Figure 2 is also useful for providing a sense of how much data, under the infinite-sites mutation model, may be needed to return posterior probabilities that have useful confidence intervals and that might be expected to reveal gene flow, if it indeed had been ongoing. For example, most of the simulated data sets did not reveal a nonzero peak for m1 (true value of 0.05 corresponding to M1 = 0.5), but most data sets did reveal a nonzero peak for m2 (true value of 0.1, corresponding to M2 = 2). For the other parameters, most curves lie fairly close to the true value, but many curves also easily span values that are double or half the true value. Thus while these modestly sized simulated data sets provide a good approximate view of the true history, the simulations also suggest that larger multilocus data sets would be required to achieve narrow credible intervals for most parameters.


We applied these methods to the divergence of D. pseudoobscura and D. persimilis (Dobzhansky and Epling 1944). This well-studied species pair is well known as the focus of much of the research by Dobzhansky and colleagues over many years (Lewontin et al. 1981). When the species are crossed, hybrid females are fertile while hybrid males and some hybrid backcross females are sterile (Dobzhansky 1936). The species are partially sympatric in the western part of North America (from California to British Columbia; Dobzhansky and Epling 1944) and are known to hybridize at a low frequency in nature (Dobzhansky 1973; Powell 1983). Recently a set of inbred lines from each species was sequenced at 16 different portions of the genome. Analyses showed that loci varied significantly in their patterns of variation, strongly suggesting the presence of gene flow at some loci, but not at others (Wang and Hey 1996; Wang et al. 1997; Machado et al. 2002; Machado and Hey 2003). However, it has not been possible until now to quantify the gene flow, together with the other relevant population size and divergence time parameters.

Inclusion of inheritance scalars as parameters:

To see the impact of including inheritance scalars as parameters, in a multilocus context, we fit the IM model to a data set for the three loci that showed zero or little evidence of recombination (see below). One locus (4002) showed no evidence of recombination, and two loci (2003 and X010) were consistent with zero recombination, provided that one sequence was removed from each sample set (the perSALEM sequence in the case of 2003 and psMATH10 in the case of X010). Two other loci (the mtDNA and eyeless) could have been included with these other three; however, both showed genealogical histories that departed markedly from others, suggesting the action of natural selection (Machado and Hey 2003).

We fit the IM model for both the case of constant inheritance scalars (h = 1 for autosomal loci 4002 and 2003, and h = 0.75 for X-linked locus X010) and the case when inheritance scalars were free to vary along with the other parameters. As shown in Figure 3 , both with and without inheritance parameters, the positions of the peaks of the marginal posterior densities for the population size parameters suggest that D. pseudoobscura has had a larger effective population than D. persimilis. In this case, the effect of including the inheritance parameters is to shift the positions of the peaks a modest amount. A similarly modest effect is observed on t, and regardless of the inheritance parameters, both migration rate parameters were nearly identical and showed strong peaks at zero (Figure 4) . The effect on mutation rate scalars is more dramatic, and the curves that result by inclusion of inheritance scalars are farther apart from each other and considerably flatter than without them (Figure 5) . The curves for the inheritance scalars themselves are shown in Figure 6 , with estimated peak locations at 2.21 (locus 2003), 1.09 (locus 4002), and 0.39 (locus X010). As expected the geometric mean of these values is near one (0.98). The estimate for a given locus reflects the departure of locus-specific effective population size from this geometric mean. The estimates vary considerably from the case when inheritance scalars are set as constants: 1 for autosomal loci (loci 2003 and 4002) and 3/4 for the X-linked locus X010. Although the curves overlap, they are consistent with the loci having different effective population sizes, possibly by the action of recurrent selective sweeps in partially linked regions of the chromosome (Gillespie 2000) or by background selection(Charlesworth et al. 1993).

Figure 3.—

The marginal densities for the population mutation rate parameters obtained by fitting the IM model to a three-locus data set (loci 4002, 2003, and X010). Distributions were obtained by integrating the full-likelihood surface over all of the other model parameters. Values for populations D. pseudoobscura, D. persimilis, and the ancestral species (θ1, θ2, and θA, respectively) are shown as dashed lines for runs in which the inheritance scalars are free to vary along with all other parameters in the model (also denoted h). The solid lines are for runs in which the inheritance scalars as set to specific values (h = 1 for loci 4002 and 2003 and h = 0.75 for locus X010).

Figure 4.—

The marginal densities for the migration and time parameters. Distributions were obtained by integrating the full-likelihood surface over all of the other model parameters. Both migration rate parameters, m1 and m2, revealed nearly identical peaks, both with and without inclusion of inheritance parameters (h). For t, the solid line shows the case when inheritance values were preset constants and the dashed line shows the case where inheritance terms are free to vary as parameters.

Figure 5.—

The marginal densities for the mutation rate scalars. Distributions were obtained by integrating the full-likelihood surface over all of the other model parameters. Values are shown for each of the three loci in the data set. As in Figures 3 and 4, results are shown for both the case when inheritance scalars are set as constants and the case when they are free to vary along with the other parameters (the latter are designated h).

Figure 6.—

The marginal densities for the inheritance parameters for the runs in which they are free to vary. Distributions were obtained by integrating the full-likelihood surface over all of the other model parameters. Values are shown for each of the three loci in the data set.

Together these three loci seem to fit the circumstances under which separate estimates of both mutation scalars and inheritance scalars can be obtained. In the first place, the amount of divergence relative to the depth of genealogies within species is not low (though neither is it very high). Estimates of divergence time in units of 2N generations (i.e., 2 t/θ) are 0.34 for D. pseudoobscura and 0.73 for D. persimilis. Also the migration rates have probably been very low or zero in both directions for these loci (Figure 4).

Loci with recombination:

The fitting of the IM model assumes that the genealogical history of a locus is strictly bifurcating and thus does not include recombination or gene conversion. Furthermore, it is difficult to include recombination in a genealogically based, likelihood framework for historical model fitting (Kuhner et al. 2000; Nielsen 2000). However, there are ways to use the method with data that come from recombining genomes by taking advantage of the imprint left by recombination on the pattern of haplotype variation at a locus. One approach is to limit analyses to loci that do not show evidence of recombination by the “four-gamete” criterion (Hudson 1985), as was done in the example described. The pitfall here is that we expect such loci to have shorter genealogies, on average. This is because those genes that happen to have shorter gene trees will also have had less opportunity for recombination within the span of their genealogical history. In the case of 4002, 2003, and X010, the data suggest that there has not been gene flow (Figure 4). If indeed the IM model is roughly appropriate in these cases, then it is probably not the case that these loci have had histories much shorter than other loci, simply because, in the absence of gene flow, the depth of the gene tree must extend at least to t. Finally, the estimate of t per kilobase pair of sequence (Table 1) is not lower for these loci than for others.

View this table:

Parameter estimates

Another approach for a locus that shows evidence of recombination is to break the data into blocks of sequence each of which does not show evidence of recombination. The algorithm of Hudson and Kaplan (1985) can be used to identify sequence blocks across which all sequences are consistent with a model of no recombination. Then together all such blocks found within a locus can be included in a multilocus fitting of the IM model (i.e., each block is a “locus”). This will violate the assumption that different loci have segregated independently, because the different loci will have had highly correlated histories because of tight linkage, and this is expected to lead to poorly estimated (and more sharply peaked) densities. However, it may still be the case that the mode of the posterior density has an expected value that is the same as a proper maximum-likelihood estimate. For each of the loci that showed evidence of recombination by the four-gamete criterion, the data were divided into multiple portions, with each portion treated as an independent locus in a run of the IM program. The locations of peak heights for each of the main parameters are shown in Table 1. Four loci (Adh, 4003, 3002, and Rh1) revealed estimates of population migration rates (M1 and M2) >0.4. This is consistent with other analyses that indicated that gene flow was limited mostly to loci that are not near chromosomal inversions that distinguish D. persimilis and D. pseudoobscura (Machado et al. 2002; Machado and Hey 2003). Interestingly, low but nonzero levels of gene flow are suggested at several X chromosomal loci, which do carry large inversions.

Yet another approach that allows a portion of the data from recombining loci to be included in a multilocus analysis is to take from each separate locus one randomly selected block of sequence identified as nonrecombining by the four-gamete criterion and to not use the remainder of the data. We used this approach in a multilocus analysis of all the loci listed in Table 1. The parameter estimates from this multilocus analysis are near those of the means of the values estimated for each of the loci individually. Overall the analyses suggest that these two species have had low levels of gene flow in the time since they began diverging.


Table 1 also shows the results of analysis on data from the mtDNA (Machado and Hey 2003). The estimated gene tree for these data differed dramatically from those of other loci, with D. pseudoobscura and D. persimilis sharing multiple complete haplotypes (despite high levels of polymorphism) and with a high level of divergence between sequences from these species and a third species, D. pseudoobscura bogotana (Machado and Hey 2003). In this case, fitting of the IM model suggests that D. persimilis has had a much larger effective population size than D. pseudoobscura and a high level of gene flow from D. persimilis to D. pseudoobscura in the coalescent (i.e., going backward in time, the direction is the reverse when considered forward in time). Given that the mtDNA includes a large number of completely linked loci, it is probably the case that natural selection has shaped this history (Machado and Hey 2003).


The study of population divergence has often been limited to two quite different general models. One class of models assumes that divergence is the result of an equilibrium between genetic drift, mutation, and limited gene flow, acting over a very long period of time, while the second class does not include gene flow, but instead supposes that divergence is the result of population splitting at some point in the past. Sewall Wright's classic work on population subdivision (Wright 1922, 1931, 1951) embraces the first class of equilibrium models, as do stepping-stone models (Kimura and Weiss 1964). In recent years methods for simultaneously estimating migration rates and population sizes have been developed for equilibrium gene flow models (Beerli and Felsenstein 1999; Bahlo and Griffiths 2000). New methods have also been developed for estimating historical population sizes and divergence times for the nonequilibrium isolation model, assuming no migration (Wakeley and Hey 1997; Nielsen 1998; Nielsen et al. 1998; Nielsen and Slatkin 2000; Rannala and Yang 2003; Wilson et al. 2003). However, for many questions concerning the divergence of populations, investigators need methods that permit assessments of both population splitting and gene flow simultaneously (Slatkin and Maddison 1989; Takahata and Slatkin 1990). For example, Nielsen and Hey (2003) showed that likelihood models that do not take migration into account are not adequate to describe the history of some human populations.

A new tool for the study of divergence:

We have extended the original MCMC method of Nielsen and Wakeley (2001) to include multiple loci with locus-specific inheritance scalars. With inclusion of multiple loci the parameters fall into three distinct categories: the primary demographic parameters (including θ1, θ2, θA, m1, m2, and t); the mutation scalars; and, if implemented as parameters, the inheritance scalars. To facilitate mixing of the Markov chain, we have also implemented Metropolis coupling (Geyer 1991). In addition, the original limitation to the infinite-sites mutation model has been overcome by the inclusion of the HKY mutation model (Palsbøll et al. 2004) and the stepwise mutation model, as well as a model that includes loci that have both an infinite-sites portion and a stepwise portion (Hey et al. 2004). With these extensions the method offers a versatile tool for addressing questions that, while traditionally quite difficult, are critical for our understanding of basic evolutionary processes of divergence.

Inheritance scalars as parameters:

With the inclusion of inheritance parameters, it may be possible for investigators to study the effects of selection, via linkage, within the IM model. If directional selection acts steadily, either as recurrent selective sweeps or as background selection, then the levels of polymorphism in a region will be a function of local gene density and recombination levels, both of which may be shared between closely related species. For example, both in the D. simulans complex (Kliman et al. 2000) and between D. pseudoobscura and its sister species (Machado et al. 2002), polymorphism levels per base pair are correlated across loci between species. One reason for this may be the action of selection. Traditionally, studies of polymorphism levels as a function of linkage have been limited to intraspecific comparisons (together with an outgroup to control for mutation rate; Begun and Aquadro 1992). Also traditionally, the fitting of demographic models like the IM model has required that loci conform to the neutral model. The invocation of inheritance parameters may allow more complete studies that include selection and demography.

Simplified models as special cases:

Another advantage of a highly parameterized IM model is that it includes a number of boundary cases that are often of interest. Thus, by prescribing migration rates of zero, the model becomes a conventional isolation model (Takahata and Nei 1985; Hey 1994; Wakeley and Hey 1997). If migration rates are nonzero and the time of splitting is specified to be very long ago, then the model becomes a simple two-island model and can be used to study the countervailing forces of genetic drift and gene flow and the equilibrium between them. If one of the descendant populations has a size of zero then the model becomes one of instantaneous population size change (i.e., at t) for the remaining population. Finally if it is specified that t = 0, such that there has effectively not been a separation, then the model becomes one of a single constant-size population. The model can also be simplified, and the number of parameters reduced, by specifying that two or all three population mutation rates are identical or that the two migration rates are identical. All of these variations are included in the computer program that realizes the method.

The divergence of D. pseudoobscura and D. persimilis:

Previous studies on these species have shown that different loci vary widely in their genealogical history and apparent phylogenetic history, in ways that are broadly consistent with a model in which gene flow occurs but is prevented at some loci by natural selection (Wang et al. 1997; Machado et al. 2002; Machado and Hey 2003). Those loci within or near inversions that distinguish D. pseudoobscura from D. persimilis show little or no evidence of gene flow, in contrast to loci on other regions of the genome. This is consistent both with the genetic map locations of loci that contribute to male hybrid sterility (Noor et al. 2001b) and with models in which chromosomal inversions played a formative role by limited gene flow early in the divergence of the species (Noor et al. 2001a; Rieseberg 2001).

By fitting the IM model we are now able to put numbers to the gene flow rates, without confounding them with our estimates of population sizes and divergence time. The estimates of the population migration rates (M1 and M2) vary considerably (particularly so when the mtDNA is considered; Table 1). However, with the exceptions of the mtDNA and the Adh locus, population migration rate estimates are <1. Interestingly, gene flow does not appear to be one sided, with approximately equal numbers of loci suggesting gene flow in each of the two directions, although for any given locus with nonzero gene flow estimates usually gene flow is indicated in only one direction.

We can also inquire of the date at which D. pseudoobscura and D. persimilis began to diverge. From Table 1, the mean estimated value of t since population splitting is 3.12 mutations per kilobase pair. To convert to absolute time, we can use the estimated absolute divergence time between D. pseudoobscura and the more distantly related species, D. miranda, of 2 million years (Aquadro et al. 1991; Wang and Hey 1996). The mean divergence between these species over the 14 loci in Table 1 is 21.2 changes per kilobase pair (Machado et al. 2002). Thus the estimated rate of divergence, per year, per kilobase is 5.3 ×10−6 changes per year and the estimated time of common ancestry between D. pseudoobscura and D. persimilis is 589,000 years (i.e., 3.12/5.3 ×10−6). Given the phenotypic similarity between these species (Dobzhansky 1944), the fact that they can produce fertile hybrids, and the estimates of gene flow between them, it is perhaps surprising that the divergence time estimate is so great. If we roughly adjust for generations (e.g., eight per year in Drosophila) then there may have been 20 times the number of generations separating these two species of Drosophila as currently separates humans and chimpanzees [i.e., assuming 6 million years divergence, at 25 years per generation (Chen and Li 2001; Brunet et al. 2002)]. The contrast suggests a slow divergence process between D. pseudoobscura and D. persimilis, notwithstanding the presence of gene flow.


We thank John Wakeley for helpful comments throughout this work.


  • Communicating editor: M. K. Uyenoyama

  • Received November 3, 2003.
  • Accepted February 16, 2004.


View Abstract