Genetics, Vol. 158, 885-896, June 2001, Copyright © 2001

Distinguishing Migration From Isolation: A Markov Chain Monte Carlo Approach

Rasmus Nielsena and John Wakeleya
a Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138

Corresponding author: Rasmus Nielsen, Department of Biometrics, 439 Warren Hall, Cornell University, Ithaca, NY 14853-7801., rn28{at}cornell.edu (E-mail)

Communicating editor: Y.-X. FU


*  ABSTRACT
*TOP
*ABSTRACT
*MODEL
*STATISTICS
*MARKOV CHAIN MONTE CARLO
*CONVERGENCE
*PROPERTIES OF THE ESTIMATORS
*DETECTING MIGRATION
*DETECTING UNEQUAL MIGRATION
*APPLICATIONS TO THE DATA...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

A Markov chain Monte Carlo method for estimating the relative effects of migration and isolation on genetic diversity in a pair of populations from DNA sequence data is developed and tested using simulations. The two populations are assumed to be descended from a panmictic ancestral population at some time in the past and may (or may not) after that be connected by migration. The use of a Markov chain Monte Carlo method allows the joint estimation of multiple demographic parameters in either a Bayesian or a likelihood framework. The parameters estimated include the migration rate for each population, the time since the two populations diverged from a common ancestral population, and the relative size of each of the two current populations and of the common ancestral population. The results show that even a single nonrecombining genetic locus can provide substantial power to test the hypothesis of no ongoing migration and/or to test models of symmetric migration between the two populations. The use of the method is illustrated in an application to mitochondrial DNA sequence data from a fish species: the threespine stickleback (Gasterosteus aculeatus).


THE analysis of population subdivision has been a major focus in population genetics and molecular ecology. However, most models of population subdivision make one of two rather extreme assumptions: either (1) the populations have been exchanging migrants at a constant rate for an infinitely long time or (2) the populations are descended from a common ancestral population at some time in the past and have since diverged without gene flow. The first of these is typically referred to as equilibrium migration and the second is often called isolation or historical association. Here we consider a more general model that includes both of these as special cases.

The objective of most biological studies that assume equilibrium migration is to describe the pattern of migration among populations. Usually, some measure of population subdivision, such as FST, is calculated and used to estimate migration rates. For example, under the symmetric island model (WRIGHT 1931 Down), FST can easily be transformed into an estimate of the migration rate. BEERLI and FELSENSTEIN 1999 Down recently introduced a modified FST-based method for asymmetric equilibrium migration. Other direct methods for estimating migration also exist, notably the maximum-likelihood methods developed by NATH and GRIFFITHS 1996 Down, BEERLI and FELSENSTEIN 1999 Down, and BAHLO and GRIFFITHS 2000 Down. When isolation without gene flow is assumed, it is of interest to estimate the divergence time between populations. In this case it is possible to transform an estimate of FST into an estimate of divergence time. In addition, there are other methods of estimating the divergence time between populations, including both maximum likelihood (NIELSEN 1998 Down; NIELSEN et al. 1998 Down) and moment-based approaches (WAKELEY and HEY 1997 Down).

However, there are very few methods that assist us in choosing the appropriate model, e.g., to distinguish between isolation and migration as explanations for observed patterns of genetic divergence. Exceptions are the method of WAKELEY 1996B Down, designed for nonrecombining DNA sequence data, and the method of NIELSEN and SLATKIN 2000 Down, applicable to unlinked restriction fragment length polymorphism (RFLP) or single nucleotide polymorphism (SNP) data. Under somewhat restrictive assumptions about historical population sizes and other aspects of demography, these methods do allow some statistical testing of the demographic models, but with low power.

Despite the fundamental importance of assessing the relative contributions of migration and isolation, there currently exists no framework for obtaining joint estimates of divergence times and migration rates from DNA sequence data. A familiar example, in which the lack of appropriate statistical methods has been felt, is the out-of-Africa controversy in human genetics. Summaries of the data, such as FST, can be explained either by assuming short divergence times and little migration or long divergence times and strong migration. It is clear, though, that genetic data can contain substantial information regarding the relative contributions of migration and historical association. For example, WAKELEY 1996A Down showed that the variance of pairwise nucleotide differences is larger under equilibrium migration than under isolation, even when FST is the same in both models. However, the variance of pairwise differences is a fairly drastic summary of the information in the data. As a single statistic, like FST, it will not likely be useful in distinguishing among more complicated demographic histories.

The aim of this article is to develop a framework for jointly estimating divergence times and migration rates between two populations from DNA sequence data. We consider a general model, described in the next section, in which the migration rate for each population may be different, and the ancestral population and the two descendent populations may all be of different effective sizes. We make use of both likelihood and Bayesian approaches to ancestral inference. The likelihood function and the posterior distributions are calculated using a Markov chain Monte Carlo (MCMC) method. Our results show that it is possible to obtain reliable joint estimates of migration rates and divergence times from single-locus DNA sequence data. In addition, it is possible to distinguish models of low gene flow and short divergence times from models of high gene flow and long divergence times and to assess asymmetries in the rates of migration between the two populations.


*  MODEL
*TOP
*ABSTRACT
*MODEL
*STATISTICS
*MARKOV CHAIN MONTE CARLO
*CONVERGENCE
*PROPERTIES OF THE ESTIMATORS
*DETECTING MIGRATION
*DETECTING UNEQUAL MIGRATION
*APPLICATIONS TO THE DATA...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

The demographic model we consider is a model of divergence between two populations that arose from a single ancestral population at generation t in the past. In the most general version of the model we allow all three populations, the ancestor and both descendants, to have different effective population sizes. The two populations may exchange migrants, and we allow the migration rate to differ between the two. A graphical representation of this demographic model is depicted in Fig 1. There are five parameters: t, N1, N2, NA, m1, and m2. N1 and N2 are the effective population sizes of the first and the second sampled populations, respectively. NA is the population size of the ancestral population, m1 is the proportion of population 1 that is replaced by migrants from population 2 in each generation, and m2 is the proportion of population 2 that is replaced by migrants from population 1 each generation. We also consider a restricted three-parameter model in which N1 = N2 = NA, and m1 = m2.



View larger version (11K):
In this window
In a new window
Download PPT slide
 
Figure 1. A graphical representation of the model considered in this article.

We assume selective neutrality and that there is no further population subdivision within each population. Further, we assume that the population sizes, N1, N2, and NA, do not change over time. To model the mutational process we use an infinite sites model without recombination (WATTERSON 1975 Down). However, the method could easily be extended to more general mutation models. We let µ be the rate of neutral infinite sites mutation per sequence per generation.

The coalescent process under this model follows from the now-classical theory of KINGMAN 1982A Down, KINGMAN 1982B Down; see also TAVARE 1984 Down and HUDSON 1991 Down. Here, we scale all parameters by N1 and consider the behavior of the model as N1 -> {infty} and 4N1µ -> {theta}, N2/N1 -> r, NA/N1 -> a, 2N1m1 -> M1, and 2N2m2 -> M2. A coalescence process then arises from a large class of exchangeable population genetic models including Moran models and Wright-Fisher models (WILKINSON-HERBOTS 1998 Down). In this coalescence process, time is measured in units of 2N1 generations. Assume that at a given point in time there are n1 ancestral gene copies in population 1 and n2 ancestral gene copies in population 2. Looking back in time, coalescence events occur in population 1 and population 2 at rate n1(n1 - 1)/2 and n2(n2 - 1)/(2r), respectively, up until the time of population divergence T = . At the same time migration events are occurring at rate n1M1 and n2M2 and mutations arise independently on each lineage according to a Poisson process with rate {theta}/2. At time T, the two populations merge (looking backward in time) and a new ancestral sample is created by setting nA = n1 + n2. Thereafter, coalescence events occur at rate nA(nA - 1)/(2a) until only one gene copy is left in the sample. This is the stochastic process that we use to assign probabilities to different genealogies.


*  STATISTICS
*TOP
*ABSTRACT
*MODEL
*STATISTICS
*MARKOV CHAIN MONTE CARLO
*CONVERGENCE
*PROPERTIES OF THE ESTIMATORS
*DETECTING MIGRATION
*DETECTING UNEQUAL MIGRATION
*APPLICATIONS TO THE DATA...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

We are interested in making inferences about the parameters {Theta} = {{theta}, M1, M2, r, a, T} using the observed sequence data X. The first approach we use is to calculate (or approximate) the likelihood function for the parameters L({Theta} | X) {propto} Pr(X | {Theta}) to obtain point estimates and confidence intervals for the parameters and to perform statistical tests.

Bayesian inference:
In addition to a likelihood framework, we also explore the utility of a Bayesian approach to estimating parameters and making probabilistic statements regarding {Theta}. In this, we assume the existence of some prior distribution for {Theta}, f({Theta}), and make inferences regarding {Theta} on the basis of the posterior distribution f({Theta} | X). The posterior distribution is obtained from the prior distribution using

(1)

We assume a uniform prior for M1, M2, T, and {theta} and assume that log(r) and log(a) are uniformly distributed on an interval symmetric around 0. Assuming uniform priors implies that we assign equal probability mass to all possible values of the parameter; i.e., we assume that all possible values of the parameter are equally likely before observing the data. To ensure that the posterior distributions are proper, i.e., that they are real probability distributions, we must constrain the parameter space. Unless otherwise stated, the parameters were constrained as follows: Mi {isin} [0, 10], i = 1, 2; T {isin} [0, 10]; log(r) {isin} [-10, 10]; log(a) {isin} [-10, 10]. These ranges should encompass most biologically important values. In no cases discussed in this article did the maximum-likelihood estimate of a parameter equal the upper boundary of the assumed parameter range. However, if very large parameter estimates are found, the support of the priors can be expanded appropriately.

Integrated likelihood:
Note that when a uniform prior distribution is used, the likelihood function L({Theta} | X) is given by the posterior distribution. In other cases, the likelihood function can be deduced by comparing the posterior distribution with the prior distribution. In most cases we concentrate on the interpretation of the posterior distribution in a likelihood framework. We summarize the information for a single parameter in terms of the integrated likelihood for the parameter. The integrated likelihood function for a parameter is obtained by integrating over the prior distribution of all other parameters. Although integrated likelihood is a basically Bayesian idea, most of the properties of a regular likelihood function also apply to the integrated likelihood function. The review by BERGER et al. 1999 Down provides a discussion of the merits and utility of integrated likelihood.

Hypotheses testing:
To perform hypothesis tests and model selection we use log-likelihood-ratio tests. The standard theory stipulates that approximately minus two times the log-likelihood ratio of two nested models is asymptotically {chi}2j distributed, where j is the difference in the number of parameters between the two models. However, this result is in general not applicable to the current problems. The first reason for this is that the parameters of interest, in several cases, are at the boundary of the parameter space, for instance, if we wish to test the hypothesis M1 = 0. When the parameter is at the boundary of the parameter space, this violates the regularity conditions under which the {chi}2 approximation has been derived. Under the assumption that there is one parameter of interest, that this parameter is on the boundary of the parameter space, and that no other parameters are on the boundary, the log-likelihood-ratio test statistic should instead be asymptotically distributed as a random variable that takes the value 0 with probability 0.5 and takes on a value from a {chi}21 distribution with probability 0.5 (CHERNOFF 1954 Down).

However, there is another important reason not to use the large sample approximations in this case: the data from individual nucleotide sites are all correlated because of shared common ancestry. The likelihood function cannot be written as the product of the likelihood in multiple independent data points. In this sense, there is only one independent data point and it is not appropriate to appeal to the standard asymptotic results.

To perform valid likelihood-ratio tests we therefore explore the use of parametric bootstrapping. Parametric bootstrapping proceeds by approximating the null distribution of the likelihood-ratio test statistic by the distribution of test statistics obtained from samples simulated under the null hypothesis. The simulations are performed by assuming that the true value of the nuisance parameters equals the value estimated under the null hypothesis. In the present case, such simulations can be performed using standard coalescence simulation methods. Unfortunately, in many applications it may be too time consuming to perform parametric bootstrapping. Therefore, we also explore another approach for model selection, which is based on the Akaike information criterion (AIC; e.g., AKAIKE 1985 Down). According to the AIC, the best model is the model that minimizes -2 x [log(L) - di], where di is the number of parameters of model i.


*  MARKOV CHAIN MONTE CARLO
*TOP
*ABSTRACT
*MODEL
*STATISTICS
*MARKOV CHAIN MONTE CARLO
*CONVERGENCE
*PROPERTIES OF THE ESTIMATORS
*DETECTING MIGRATION
*DETECTING UNEQUAL MIGRATION
*APPLICATIONS TO THE DATA...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

The simulation method we use to approximate the likelihood function/posterior distribution of the parameters is similar to the methods applied by WILSON and BALDING 1998 Down and NIELSEN 2000A Down, which are related to the methods by KUHNER et al. 1995 Down and BEERLI and FELSENSTEIN 1999 Down. These use the Metropolis-Hastings method (METROPOLIS et al. 1953 Down; HASTINGS 1970 Down) to integrate over genealogies stochastically. We wish to approximate the posterior distribution of the parameters f({Theta} | X) = cf(X | {Theta})f({Theta}), where c is a constant proportionality factor. It is not possible in general to calculate f(X | {Theta}), except by conditioning on the underlying gene genealogy, represented by G. The posterior distribution can then be written as

(2)

and the problem of estimating the posterior distribution/likelihood function is reduced to the problem of solving the integral in (2), where {Gamma} refers to the set of all possible gene genealogies.

Under the infinite sites model f(X | {Theta}, G) can easily be calculated by mapping mutations on the gene genealogy (G). The number of mutations on an edge is Poisson distributed with rate {theta}t/2, where t is the length of the edge. The total data likelihood conditional on the gene genealogy can, therefore, be evaluated as the product of multiple independent Poisson random variables.

Metropolis-Hastings MCMC:
To evaluate the integral in (2) it is necessary to use Monte Carlo methods. As in WILSON and BALDING 1998 Down and NIELSEN 2000A Down, it is possible to define a Markov chain with state space given by {Gamma} and with stationary distribution of genealogies proportional to f(X | {Theta}, G)f(G). By sampling from this chain at stationarity it is possible to approximate the posterior distribution of {Theta}. To assure that the chain has the desired stationary distribution, we use the Metropolis-Hastings method (METROPOLIS et al. 1953 Down; HASTINGS 1970 Down). At each step in the chain, updates from the current state ({Theta}i, Gi) to a new state ({Theta}*, G*) are proposed with probability q[({Theta}i, Gi) -> ({Theta}*, G*)]. With probability

(3)

the new proposed state is accepted; i.e., ({Theta}i + 1, Gi + 1) = ({Theta}*, G*), otherwise ({Theta}i + 1, Gi + 1) = ({Theta}i, Gi). If the chain is constructed such that it is aperiodic and irreducible, it will under quite general conditions have f({Theta} | X) as its stationary distribution (e.g., RIPLEY 1987 Down). The Markov chain is constructed by suggesting updates to the gene genealogy and to the parameters one by one. The chain therefore consists of a mixture of the six type of updates for {Theta} and updates of the genealogy, including the times of migration. The algorithmic details of these six update types are described in the Appendix

Smoothing function:
To obtain smooth likelihood surfaces from a single run of the Markov chain, the method of NIELSEN 2000B Down was used. In brief, the likelihood function is estimated by simulating a Markov chain with stationary distribution of G and {Theta}0 proportional to Pr(X | G)P{Theta}0(G)P({Theta}0), and estimates of the likelihood function for a particular value of {Theta} are then obtained as

(4)

where c is a constant of proportionality and r({Theta}, {Theta}0) is a weighting function. r({Theta}, {Theta}0) is used to down-weight values of w({Theta}, {Theta}0, G) for which |{Theta} - {Theta}0| is large. The weighting function r({Theta}, {Theta}0) may take any functional form and is here defined as r({Theta}, {Theta}0) = e, with c = 0.01. This function was chosen because it was found in preliminary investigations to lead to fast convergence. This method can be thought of as a weighted importance sampling scheme where the importance sampling function is partially determined by the data.


*  CONVERGENCE
*TOP
*ABSTRACT
*MODEL
*STATISTICS
*MARKOV CHAIN MONTE CARLO
*CONVERGENCE
*PROPERTIES OF THE ESTIMATORS
*DETECTING MIGRATION
*DETECTING UNEQUAL MIGRATION
*APPLICATIONS TO THE DATA...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

In Markov chain Monte Carlo methods it is typically easy to show that the Markov chain will converge and that a consistent estimate of the likelihood function can be obtained. However, it is usually much more difficult to show how many simulated steps of the chain are needed to ensure convergence of the ergodic averages. If too few steps are simulated, the estimate of the likelihood function will be biased and will depend on the initial values of the parameters. The first issue we explore is, therefore, the degree to which assumption of stationarity and convergence to the ergodic averages are satisfied in the present case for realistic simulation times. We consider a simple version of the model with 3 parameters: {theta}, M, and T; that is, with a = 1, r = 1, and M1 = M2. To evaluate the method we use the data set previously published by ORTI et al. 1994 Down containing 23 sequences and 35 segregating sites. The integrated likelihood function of M obtained using 10 replicate runs of the Markov chain is plotted in Fig 2. The surfaces were obtained using 1,000,000 steps in the chain and a "burn-in" time of 100,000 steps. Only samples of the chain taken after the burn-in time are used for estimating the posterior distribution/likelihood surface. The starting values of the parameters and the genealogy differed between runs. The high degree of similarity of these distributions suggests that stationarity and convergence to the ergodic averages has been achieved. Thus, 1,000,000 steps in the chain appear sufficient for data sets of this size to provide reliable likelihood surfaces/posterior distributions.



View larger version (14K):
In this window
In a new window
Download PPT slide
 
Figure 2. Ten independent replicates of the likelihood surfaces for M obtained for the data set from ORTI et al. 1994 Down containing 23 sequences and 35 segregating sites.


*  PROPERTIES OF THE ESTIMATORS
*TOP
*ABSTRACT
*MODEL
*STATISTICS
*MARKOV CHAIN MONTE CARLO
*CONVERGENCE
*PROPERTIES OF THE ESTIMATORS
*DETECTING MIGRATION
*DETECTING UNEQUAL MIGRATION
*APPLICATIONS TO THE DATA...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

We also explored the statistical properties of the point estimators of M and T under the three-parameter model. One hundred data sets were simulated under this model for each of two parameter settings: (M = 0, T = 2, {theta} = 10) and (M = 1, T = {infty}, {theta} = 10). In these simulations it was assumed that 30 individuals were sampled from each population. A burn-in time of 500,000 steps was chosen and the chains were run for an additional 5,000,000 steps to evaluate the likelihood surfaces/posterior distributions. Some sample posterior distributions of T and M are shown in Fig 3. The mode of the integrated likelihood function of each parameter is used as an estimator of the parameter. In the 100 simulations the mean estimate of M was 0.0055 for the case (M = 0, T = 2, {theta} = 10) and 1.02 for the case (M = 1, T = {infty}, {theta} = 10). The estimate of M will of course always be slightly upwardly biased when the true parameter is at the boundary of the parameter space (M = 0). The distribution of the estimates of M is shown in Fig 4. Note that the variance of the estimates obtained in the model (M = 1, T = {infty}, {theta} = 10) is quite high. Nonetheless, it is encouraging that reasonable estimates can be obtained in such cases by using only a single locus without recombination, even when taking the possibility of recent common ancestry of the populations into account.




View larger version (26K):
In this window
In a new window
Download PPT slide
 
Figure 3. Sample posterior distributions of M and T estimated from data simulated assuming M = 0, T = 2, and {theta} = 10 (a and b) and M = 1, T = {infty}, and {theta} = 10 (c and d).



View larger version (11K):
In this window
In a new window
Download PPT slide
 
Figure 4. The distribution of estimates of M assuming M = 0, T = 2, and {theta} = 10 (open bar) and M = 1, T = {infty}, and {theta} = 10 (solid bars) in 100 replicate simulations.

The estimator of T does not appear to have similarly desirable properties, at least not in the case of T = {infty}. There are two reasons for this. First, the Monte Carlo variance for the parameter T seems to be quite large in many cases (compare Fig 3C to Fig 3D). Since the likelihood surface often is very flat for this parameter, the estimates may not be reliable. The second reason is that the integrated likelihood surface often has multiple peaks, e.g., Fig 3C. This is in fact a real property of the likelihood function, rather than an artifact of the Monte Carlo variance. The multimodality can easily be explained by considering the structure of the underlying gene genealogies and is a consequence of having only a limited number of migration events occurring in the ancestry of the sample. Consider the hypothetical case where the times of migration events in the genealogy are known and fixed. Assuming low migration, the likelihood will then always be higher when T is slightly smaller than the age of the migration event than if T is slightly larger. The likelihood surface for T may therefore increase as T approaches the time of a migration event and decrease at the time right after a migration event. Obviously, the times of migration events are not known and fixed in real data, but may be quite well determined if there are sufficient nucleotide data, causing the integrated likelihood surface for T to have multiple modes.

In contrast, for the case of (M = 0, T = 2, {theta} = 10), the estimator seems to perform quite well with an average estimate of = 2.16 and multimodal likelihood surfaces appear to be rare.

A traditional Bayesian estimator based on the posterior expectation of the parameter was not used because it in many cases was very sensitive to the choice of upper bound for the parameter. This was expected for T, since the integrated likelihood function in some cases was an increasing function of T. However, even for many of the other parameters, the posterior expectation was quite sensitive to the choice of prior. Considering data from multiple loci could reduce this problem.


*  DETECTING MIGRATION
*TOP
*ABSTRACT
*MODEL
*STATISTICS
*MARKOV CHAIN MONTE CARLO
*CONVERGENCE
*PROPERTIES OF THE ESTIMATORS
*DETECTING MIGRATION
*DETECTING UNEQUAL MIGRATION
*APPLICATIONS TO THE DATA...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

The next issue we focus on is the possibility of detecting migration when it occurs. Can we reject a model of isolation without gene flow in favor of a model that includes migration? To address this issue we examine the log-likelihood ratio log[]. A plot of the likelihood ratio in the 100 simulations for (M = 0, T = 2, {theta} = 10) (open bars) and (M = 1, T = {infty}, {theta} = 10) (solid bars) is shown in Fig 5. Note first that, as expected, minus two times the log-likelihood ratio is far from being {chi}21 distributed when the null hypothesis is true (open bars). The deviation is larger than expected for a test in which the parameter is on the boundary of the parameter space (CHERNOFF 1954 Down), emphasizing that the usual asymptotics should not be applied for a population sample of a single locus. In the present case, as was also found in NIELSEN and PALSBOLL 1999 Down, application of the standard {chi}2 approximation results in a conservative test.



View larger version (14K):
In this window
In a new window
Download PPT slide
 
Figure 5. The distribution of the log-likelihood ratio of the hypothesis M = 0 vs. M >= 0.0. One hundred replicate simulations were performed for each of the parameters settings M = 0 and T = 2 (open bars), M = 1 and T = {infty} (solid bars), and M = 0.25 and T = 1 (hatched bars). In all cases {theta} = 10.

However, note the difference in the distribution of the likelihood ratio in the case of (M = 0, T = 2, {theta} = 10) and (M = 1, T = {infty}, {theta} = 10). This suggests that the method indeed has a lot of power to distinguish between the two models. In fact, using parametric bootstrapping, the hypothesis of M = 0 would be rejected in 100 of 100 cases when M = 1, T = {infty}, and {theta} = 10. Using the AIC, a model with migration would be accepted in 98 out of 100 cases when M = 1 and in 0 of 100 cases when M = 0. In this case, the AIC provides a very powerful method of distinguishing between models with and without migration.

To investigate the power to distinguish models when there is just a small amount of migration, simulations were performed assuming M = 0.25, T = 1, and {theta} = 10. Note (Fig 5, hatched bars) that the likelihood ratios are somewhat intermediate in this case. Using the AIC, the model without migration would be rejected in 39 out of 100 cases and accepted in 61 of 100 cases. Using parametric bootstrapping the model of no migration would have been rejected in 65 out of 100 cases. This suggests that likelihood-ratio tests with reasonable power can in fact be established. When computationally possible, it is recommended to obtain critical values for hypothesis testing using the parametric bootstrap. For the present parameter settings, 100 simulations for a single data set of this size would take ~1 week on a medium fast workstation.


*  DETECTING UNEQUAL MIGRATION
*TOP
*ABSTRACT
*MODEL
*STATISTICS
*MARKOV CHAIN MONTE CARLO
*CONVERGENCE
*PROPERTIES OF THE ESTIMATORS
*DETECTING MIGRATION
*DETECTING UNEQUAL MIGRATION
*APPLICATIONS TO THE DATA...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Another possible application of the method is to detect unequal, or asymmetric, migration, i.e., a higher rate of migration from one population to the other than in the opposite direction. To investigate the properties of the method in this regard, an equilibrium migration model with parameters {theta} = 10.0, M1 = 1.0, M2 = 0.0, r = 1.0, a = 1.0, and T = {infty} was simulated. One hundred simulations were performed, each using 10,000,000 steps in the Markov chain to estimate the posterior distribution/likelihood surface. Estimates of M1 and M2 were obtained using the mode of the posterior distribution of these parameters and are shown in Fig 6. Estimates of M2 were all in the range 0.0–0.2 but only five estimates of M1 fell in this range.



View larger version (11K):
In this window
In a new window
Download PPT slide
 
Figure 6. The distribution of estimates of M1 (open bars) and M2 (solid bar) in 100 data sets simulated assuming {theta} = 10.0, M1 = 1.0, M2 = 0.0, r = 1.0, a = 1.0, and T = {infty}.

Using the AIC, a model with M1 = 0 would be rejected in 67 of 100 cases and M2 = 0 would be rejected in 0 out of 100 cases. Parametric bootstrapping to test models was not performed in this case. Performing such a test would take a few weeks on a medium fast workstation. In any case, it is obvious from these simulations that it is in fact possible to detect unequal migration and that a single nonrecombining locus might be sufficient in most cases for this purpose as long as {theta} is sufficiently large. If necessary, multiple independent loci could be used to further increase the power.


*  APPLICATIONS TO THE DATA BY Orti et al. 1994 Down
*TOP
*ABSTRACT
*MODEL
*STATISTICS
*MARKOV CHAIN MONTE CARLO
*CONVERGENCE
*PROPERTIES OF THE ESTIMATORS
*DETECTING MIGRATION
*DETECTING UNEQUAL MIGRATION
*APPLICATIONS TO THE DATA...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

A previous method for distinguishing migration from isolation on the basis of variance of pairwise differences was established by WAKELEY 1996B Down. That method was applied to a data set by ORTI et al. 1994 Down of mtDNA sequences from the western and eastern Pacific stickleback populations, and the hypothesis of pure isolation (M = 0) was rejected with a P value of 0.013. To compare the methods and to provide an illustration of the new method on the basis of real data, we reanalyzed the data set by ORTI et al. 1994 Down using our new method. The integrated likelihood function for T and M is shown in Fig 7. Thus, our analysis agrees well with that of WAKELEY 1996B Down. The data seem to be most compatible with a model of large divergence times with rates of migration of ~M = 0.3. A model of moderate migration and very long divergence times is more compatible with the data than a model of short divergence times and low migration rates.



View larger version (117K):
In this window
In a new window
Download PPT slide
 
Figure 7. The joint integrated likelihood surface for T and M estimated from the data by ORTI et al. 1994 Down. Darker values indicate higher likelihood.

Further, using our method we can also extract information regarding the direction of gene flow between the populations. The integrated likelihood surfaces for M1 and M2 are shown in Fig 8. The data appear to be compatible with a model of unequal gene flow between the populations. A strictly decreasing likelihood surface for M2 indicates low levels of gene flow from the eastern Pacific to the western Pacific populations. In contrast, the estimate of M1 based on the integrated likelihood function is ~0.5 and the likelihood ratio of the hypothesis M1 = 0 is >2.0. There seems to be ongoing gene flow from the western to the eastern Pacific populations but little gene flow in the opposite direction.



View larger version (10K):
In this window
In a new window
Download PPT slide
 
Figure 8. The integrated likelihood surfaces for M1 (dots) and M2 (solid lines) estimated from the data by ORTI et al. 1994.

Another question we are able to address is the difference in effective population size between the two populations. If a model without gene flow were applied to these stickleback populations, the migration from western to eastern Pacific could inflate the estimates of effective population size of the recipient population. For example, based on the average number of pairwise differences within populations, the estimate of the effective population size of the eastern Pacific population is 4.3 times the size of the western Pacific population. When migration is taken into account, the estimate based on the mode of the integrated likelihood function is that the eastern Pacific population is 3.0 times larger than the size of the western Pacific population (Fig 9).



View larger version (8K):
In this window
In a new window
Download PPT slide
 
Figure 9. The integrated likelihood surface for r estimated from the data by ORTI et al. 1994 Down.

We can also consider another parameter of interest: the size of the ancestral population relative to the current size of one of the populations. The integrated likelihood for this parameter (a) is shown in Fig 10. Note that the likelihood surface is very flat. The reason is that the data are compatible with a model of long divergence times, in which case there is only little information regarding a preserved. The MLE of a is ~1.7.



View larger version (9K):
In this window
In a new window
Download PPT slide
 
Figure 10. The integrated likelihood surface for a estimated from the data by ORTI et al. 1994 Down.


*  DISCUSSION
*TOP
*ABSTRACT
*MODEL
*STATISTICS
*MARKOV CHAIN MONTE CARLO
*CONVERGENCE
*PROPERTIES OF THE ESTIMATORS
*DETECTING MIGRATION
*DETECTING UNEQUAL MIGRATION
*APPLICATIONS TO THE DATA...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

We demonstrated that estimates of the parameters {theta}, M1, M2, r, a, and T can be obtained within either a Bayesian or likelihood framework using an MCMC method. In most cases, reliable parameter estimates can be obtained in a few hours on a desktop computer. We also found substantial power to test competing demographic hypotheses using a likelihood-ratio test. For example, using parametric bootstrapping, a model of no migration is rejected at the 5% level in 100/100 cases when the true model is equilibrium migration with M = 1.0 and {theta} = 10.0. This is a dramatic improvement over the methods of WAKELEY 1996B Down and NIELSEN and SLATKIN 2000 Down. For example, WAKELEY's (1996b) method would have ~30% power under similar circumstances.

The simulations presented here support the use of the AIC (e.g., AKAIKE 1985 Down) as a means of identifying plausible demographic models in ecological genetic studies. It is possible to address the classic problem of distinguishing between short divergence times with low gene flow and long divergence times with moderate gene flow as explanations of population divergence. In addition, this method can be used to assess the importance of asymmetries in migration rates between populations. A well-known case where these tools could be of use is in studies of the emergence of modern humans. On the one hand, the out-of-Africa hypothesis (e.g., STRINGER and ANDREWS 1988 Down; VIGILANT et al. 1991 Down) assumes low migration and recent divergence among human populations, and on the other, the "multiregional" model (e.g., BROOKS and WOOD 1990 Down; WOLPOFF 1996 Down) dictates long divergence times between human populations. With appropriate modifications, the method presented here could be used to resolve this and other disputes.

Our method has several important limitations. Most importantly, it assumes an infinite sites model of mutation. This restriction may be prohibitive in many applications. However, the method can be extended to the case of the finite sites model, although this may significantly increase the required computational time. In general, computational time is a limitation in these methods. Most reasonably sized data sets can be analyzed using the present methods, but there may be data sets where the computational time is prohibitive, especially when applied to finite sites models. Also, since the method is based on MCMC, it is very important to assess the convergence in each particular study. We recommend running multiple chains using different starting values for the parameters. If all chains give similar results, they provide good evidence that convergence of the ergodic averages was achieved.

For the purpose of hypothesis testing, we recommend using parametric bootstrapping. Use of the classical {chi}2 approximation, possibly with the correction by CHERNOFF 1954 Down, would lead to conservative tests with reduced power in the examples we investigated here. Tests based on parametric bootstrapping are computationally intensive and a heuristic application of the apparently conservative {chi}2 criterion or of the AIC criterion may therefore in some cases be preferable.

We did not report the results based on Bayesian tests. From a practical standpoint, such tests were not desirable because the parameter estimates were quite sensitive to the choice of prior. Also, using improper uniform prior distributions on the interval [0, {infty}) results in improper posterior distributions for most parameters.

Computer programs implementing the methods are available on request from Rasmus Nielsen (rn28{at}cornell.edu).


*  ACKNOWLEDGMENTS

We thank the editor and two anonymous referees for comments and suggestions. We also thank John Huelsenbeck and the evolutionary genetics discussion group at Cornell University for much discussion and useful suggestions. This work was supported by National Science Foundation grant DEB-9815367 to J.W.

Manuscript received September 5, 2000; Accepted for publication March 2, 2001.


*  APPENDIX
*TOP
*ABSTRACT
*MODEL
*STATISTICS
*MARKOV CHAIN MONTE CARLO
*CONVERGENCE
*PROPERTIES OF THE ESTIMATORS
*DETECTING MIGRATION
*DETECTING UNEQUAL MIGRATION
*APPLICATIONS TO THE DATA...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

In the following we describe the details of the different types of updates of the Markov chain. In this description, we consider the times of migration as a part of the description of a genealogy (G). G therefore consists of a topology, a vector of coalescence times, and a vector of migration times for each edge in G. The algorithm is similar to the proposal algorithm from BEERLI and FELSENSTEIN 1999 Down.

Updates of G:
Updates of the gene genealogy are performed by simulating the ancestry of a lineage from the coalescence prior conditional on the remaining part of the genealogy. The simulations can be described by the following algorithm:

  1. Choose a random edge in the tree uniformly among all edges.

  2. Detach the edge from the tree at the point in which the edge coalesces with some other "sister-edge" in the genealogy.

  3. Start simulations of the new path of the edge at the time of the origin of the edge, i.e., at the time where two sister-edges coalesced into the focal edge or at time 0 if the edge existed at this point in time.

  4. Stop the simulations when the edge coalesces with a new sister-edge.

The simulations of the path of the edge are performed relatively easily using the usual coalescence simulation methods. For example, if the edge at a given time exists in population 1 and there are n1 ancestral gene copies existing in population 1, the edge migrates into population 2 at rate M1 and coalesces with each lineage in population 1 at rate 1, when time is measured in units of 2N1. It is therefore possible to simulate the ancestry of the lineage conditional on the remaining gene genealogy simply by simulating a series of exponential random variables. However, such simulations are somewhat more complicated than usual coalescence simulations because at each step it is necessary to keep track of the identity of all edges existing in the population at a particular time. This means that the gene genealogy is not only represented in terms of the genealogical structure itself but also in terms of the times of migrations of edges between the populations. Also, some minor technical difficulties must be overcome when the new history of the edge changes the time of the root (the time to the most recent common ancestor of the sample).

Since the path of the gene genealogy is simulated according to the prior distribution, the acceptance probability for this type of update is simply min{1,f(X | {Theta}i, G*)/f(X | {Theta}i, Gi)}. The conditional likelihood f(X | {Theta}, G) can be calculated for any {Theta} and G under the infinite sites model by mapping mutations on the genealogy (there is only one unique way of doing this under the infinite sites model).

Although it may be possible to establish more efficient types of update under the infinite sites model, this type of update was chosen because it can also be applied when the mutational model is more complex.

Updates of M:
M1 and M2 are defined either on the interval [0, {infty}) or on the interval [0, Mmax]. The following description assumes that Mi {isin} [0, Mmax]. The update algorithm for the case of Mi {isin} [0, {infty}) follows easily from this description. Let the current value of Mj be Mji; a proposal value of Mj (M*j) is then drawn uniformly from the interval [Mji - {Delta}M, Mji + {Delta}M], where {Delta}M is chosen such that {Delta}M <= Mmax. If M*j < 0, set M*j = -M*j, and if M*j > Mmax, set M*j = 2Mji - M*j. Using this reflection of M*j around 0 and Mmax it is guaranteed that q[({Theta}*, Gi) -> ({Theta}i, Gi)] = q[({Theta}i, Gi) -> ({Theta}*, Gi)], where {Theta}* = {M*j} {cup} {Theta}i\\{Mji}. Also, since the structure of the genealogy does not change, the acceptance probability does not depend on the data and is just given by min{1, f(Gi | {Theta}*)/f(Gi | {Theta}i)}, which can be calculated quite fast. Let {tau}k be the time between the k - 1th and the kth coalescence or migration event and define {psi}k = {Sigma}ki=1 {tau}k. If {psi}k > T and {psi}k - 1 < T, define {tau}k = T - {psi}k - 1. Also denote the number of active ancestral lineages in population j in the time interval between the k - 1th and the kth coalescence or migration event by njk. Then the ratio f(Gi | {Theta}*)/f(Gi | {Theta}i) is given by

(A1)

Updates of r:
Updates of r are chosen similarly to updates of Mj. It is assumed that log(r) is distributed uniformly on the interval [log(1/rmax), log(rmax)]. A new value of r (r*) is proposed by choosing log(r*) uniformly in the interval [ri - {Delta}r, ri + {Delta}r], where {Delta}r is chosen such that {Delta}r <= rmax. If r* < 1/rmax, set log(r*) = 2 log(1/rmax) - log(r*) and if r* < rmax, set log(r*) = 2 log(rmax) - log(r*). The acceptance ratio of this type of update is again just given by the Metropolis factor

(A2)

where now {Theta}* = {r*} {cup} {Theta}i\\{ri}.

Updates of a:
Updates of a are generated in a manner identical to the method used for generating updates to r; i.e., a new value of a (a*) is chosen in an interval of size 2{Delta}a around the current value of a (ai). The acceptance ratio is again just given by the Metropolis factor,

(A3)

where nk is the number of edges in the genealogy in the kth interval, {Theta}* = {a*} {cup} {Theta}i\\{ai}, and {tau}k now is defined so that {tau}k = {psi}k - T if {psi}k > T and {psi}k - 1 < T.

Updates of {theta}:
This type of update is similar to the updates previously described. A new value of {theta} ({theta}*) is drawn from the interval [{theta}i - {Delta}{theta}, Ti + {Delta}{theta}], where {Delta}{theta} is chosen such that {Delta}{theta} <= {theta}max. The acceptance probability is

(A4)

where {Theta}* = {{theta}*} {cup} {Theta}i\\{{theta}i}, S is the total number of mutations in the tree, and ßi is the total tree length of genealogy Gi.

Updates of T:
In the following discussion, time is increasing backward in time and the time of sampling is t = 0. Updates to T are chosen using the method described for the other parameters, i.e., uniformly in an interval [Ti - {Delta}T, Ti + {Delta}T]. However, some care must be taken to preserve the reversibility conditions of the chain. If we denote the time at which edge j arises by a coalescence event by tstartj and the time it ends in a coalescence event with another edge by tendj, then it will occur for some j that tstartj < T* and tendj < Ti if T* > Ti and Ti is larger than the time to the most recent common ancestor (TMRCA); i.e., edge j exists in the time interval [Ti, T*].

A solution to this problem is simply to simulate the path of migrations of the edges in the time interval [Ti, T*]. An update to T is always a joint update of T and G. A method for performing such simulations is given by the following algorithm:

  1. Set j = 1.

  2. If edge j exists in time interval [Ti, T*], simulate migrations on the edge according to a two-state continuous time Markov chain with infinitesimal generator

    starting at the current state of the edge at time min{Ti, tstartj} and stopping at time max{T*, tendj}.

  3. If edge j exists in time interval [Ti, T*], tendj < T*, j has an index larger than its sister edge (edge j is a right sister edge) and edge j does not end up in the same population as its sister edge: Repeat (2).

  4. If j is less than the total number of edges in the genealogy, set j = j + 1 and repeat.

This algorithm does not simulate migration paths according to the prior distribution, but does simulate them according to an approximation of this distribution. A Hastings term ensures that the Markov chain has the desired stationary distribution. Also, note that this algorithm will be very slow when M1 and/or M2 and {kappa}j = tendj - min{Ti, tstartj} is small and an odd number of migration events must occur on a right edge for it to end (coalesce) in the same population as its sister edge. The algorithm can be speeded up in such cases by letting the first migration event be simulated conditional on at least one migration event occurring. For example, if edge j exists in population 1 at time min{Ti, tstartj} but must be in population 2 at time tendj, then the probability density of the time to the first migration event (tM), conditional on at least one migration event occurring, is given by

(A5)

Using the inverse transformation method, the time to the first migration event can be simulated by choosing

(A6)

where U is a uniform [0, 1] random number. It is thereby possible to simulate migration paths relatively fast even when M1 and {kappa} are small and an odd number of migration events must occur on a right edge for it to end (coalesce) in the same population as its sister edge.

If T* < Ti the migration events occurring in the interval [T*, Ti] are simply erased. The acceptance ratio for this type of update is

(A7)

where

and {psi}k is defined as in the case of updates of M1 and M2, B is the set of all edges existing in the interval [T*, Ti], and tjk is the time of the kth migration on the jth edge, tj0 =. min{Ti, tstartj}. If T* > Ti, the acceptance ratio can be calculated similarly as the inverse term. This completes the description of the MCMC algorithm.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MODEL
*STATISTICS
*MARKOV CHAIN MONTE CARLO
*CONVERGENCE
*PROPERTIES OF THE ESTIMATORS
*DETECTING MIGRATION
*DETECTING UNEQUAL MIGRATION
*APPLICATIONS TO THE DATA...
*DISCUSSION
*APPENDIX
*LITERATURE CITED

AKAIKE, H., 1985 Prediction and entropy, pp. 1–24 in A Celebration of Statistics, edited by A. C. ATKINSON and S. E. FIENBERG. Springer-Verlag, Berlin.

BAHLO, M. and R. C. GRIFFITHS, 2000  Inference from gene trees in a subdivided population. Theor. Popul. Biol. 57:79-95[Medline].

BEERLI, P. and J. FELSENSTEIN, 1999  Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. Genetics 152:763-773[Abstract/Free Full Text].

BERGER, J. O., B. LISEO, and R. L. WOLPERT, 1999  Integrated likelihood methods for eliminating nuisance parameters. Stat. Sci. 14:1-28.

BROOKS, A. S. and B. WOOD, 1990  Paleoanthropology: the Chinese side of the story. Nature 344:288-289[Medline].

CHERNOFF, H., 1954  On the distribution of the likelihood ratio. Ann. Math. Stat. 25:573-578.

HASTINGS, W. K., 1970  Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97-109[Abstract/Free Full Text].

HUDSON, R. R., 1991 Gene genealogies and the coalescent process, pp. 1–44 in Oxford Surveys of Evolutionary Biology, edited by D. FUTUYMA and J. ANTONOVICS. Oxford University Press, Oxford.

KINGMAN, J. F. C., 1982a  The coalescent. Stoch. Proc. Appl. 13:235-248.

KINGMAN, J. F. C., 1982b  On the genealogy of large populations. J. Appl. Prob. 19:27-43.

KUHNER, M. K., J. YAMATO, and F. FELSENSTEIN, 1995  Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. Genetics 140:1421-1430[Abstract].

METROPOLIS, N., A. W. ROSENBLUTH, M. N. ROSENBLUTH, A. H. TELLER, and E. TELLER, 1953  Equations of state calculations by fast computing machines. J. Chem. Phys. 21:1087-1091.

NATH, H. B. and R. GRIFFITHS, 1996  Estimation in an island model using simulation. Theor. Popul. Biol. 50:227-253[Medline].

NIELSEN, R., 1998  Maximum likelihood estimation of population divergence times and population phylogenies under the infinite sites model. Theor. Popul. Biol. 53:143-151[Medline].

NIELSEN, R., 2000a  Estimation of population parameters and recombination rates using single nucleotide polymorphisms. Genetics 154:931-942[Abstract/Free Full Text].

NIELSEN, R., 2000b Improved MCMC methods for some population genetic models. Technical report no. M-1556. Department of Biometrics, Cornell University, Ithaca, NY.

NIELSEN, R. and P. J. PALSBOLL, 1999  Tests of microsatellite evolution: multi-step mutations and constraints on allele size. Mol. Phylogenet. Evol. 11:477-484[Medline].

NIELSEN, R. and M. SLATKIN, 2000  Analysis of population subdivision using di-allelic models. Evolution 54:44-50[Medline].

NIELSEN, R., J. L. MOUNTAIN, J. P. HUELSENBECK, and M. SLATKIN, 1998  Maximum-likelihood estimation of population divergence times and population phylogeny in models without mutation. Evolution 52:669-677.

ORTI, G., M. A. BELL, T. E. REIMCHEN, and A. MEYER, 1994  Global survey of mitochondrial DNA sequences in the threespine stickleback: evidence for recent migrations. Evolution 48:608-622.

RIPLEY, B., 1987 Stochastic Simulation. Wiley, New York.

STRINGER, C. B. and P. ANDREWS, 1988  Genetic and fossil evidence for the origin of modern humans. Science 239:1263-1268[Abstract/Free Full Text].

TAVARÉ, S., 1984  Lines of descent and genealogical processes, and their application in population genetics models. Theor. Popul. Biol. 26:119-164[Medline].

VIGILANT, L., M. STONEKING, H. HARPENDING, K. HAWKES, and A. C. WILSON, 1991  African populations and the evolution of human mitochondrial DNA. Science 253:1503-1507[Abstract/Free Full Text].

WAKELEY, J., 1996a  The variance of pairwise nucleotide differences in two populations with migration. Theor. Popul. Biol. 49:39-57[Medline].

WAKELEY, J., 1996b  Distinguishing migration from isolation using the variance of pairwise differences. Theor. Popul. Biol. 49:369-386[Medline].

WAKELEY, J. and J. HEY, 1997  Estimating ancestral population parameters. Genetics 145:847-855[Abstract].

WATTERSON, G. A., 1975  On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7:256-276[Medline].

WILKINSON-HERBOTS, H. M., 1998  Genealogy and subpopulation differentiation under various models of population structure. J. Math. Biol. 37:535-585.

WILSON, I. J. and D. J. BALDING, 1998  Genealogical inference from microsatellite data. Genetics 150:499-510[Abstract/Free Full Text].

WOLPOFF, M. H., 1996  Interpretations of multiregional evolution. Science 274:704-707[Free Full Text].

WRIGHT, S., 1931  Evolution in Mendelian populations. Genetics 16:97-159[Free Full Text].




This article has been cited by other articles:


Home page
GeneticsHome page
Y. Wang and J. Hey
Estimating Divergence Parameters With Small Samples From a Large Number of Loci
Genetics, February 1, 2010; 184(2): 363 - 379.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. S. Lopes, D. Balding, and M. A. Beaumont
PopABC: a program to infer historical demographic parameters
Bioinformatics, October 15, 2009; 25(20): 2747 - 2749.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
H. Ikeda, N. Fujii, and H. Setoguchi
Application of the Isolation with Migration Model Demonstrates the Pleistocene Origin of Geographic Differentiation in Cardamine nipponica (Brassicaceae), an Endemic Japanese Alpine Plant
Mol. Biol. Evol., October 1, 2009; 26(10): 2207 - 2216.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
G. Achaz
Frequency Spectrum Neutrality Tests: One for All and All for One
Genetics, September 1, 2009; 183(1): 249 - 258.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
D. Wegmann, C. Leuenberger, and L. Excoffier
Efficient Approximate Bayesian Computation Coupled With Markov Chain Monte Carlo Without Likelihood
Genetics, August 1, 2009; 182(4): 1207 - 1218.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
H. Ikeda, N. Fujii, and H. Setoguchi
Molecular Evolution of Phytochromes in Cardamine nipponica (Brassicaceae) Suggests the Involvement of PHYE in Local Adaptation
Genetics, June 1, 2009; 182(2): 603 - 614.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
C. M. Bossu and T. J. Near
Gene Trees Reveal Repeated Instances of Mitochondrial DNA Introgression in Orangethroat Darters (Percidae: Etheostoma)
Syst Biol, May 22, 2009; (2009) syp014v1.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Bot.Home page
L. E. Wallace, S. G. Weller, W. L. Wagner, A. K. Sakai, and M. Nepokroeff
Phylogeographic patterns and demographic history of Schiedea globosa (Caryophyllaceae) on the Hawaiian Islands
Am. J. Botany, May 1, 2009; 96(5): 958 - 967.
[Abstract] [Full Text] [PDF]


Home page
J HeredHome page
Y. Yasukochi, S. Nishida, S.-H. Han, T. Kurosaki, M. Yoneda, and H. Koike
Genetic Structure of the Asiatic Black Bear in Japan Using Mitochondrial DNA Analysis
J. Hered., May 1, 2009; 100(3): 297 - 308.
[Abstract] [Full Text] [PDF]


Home page
J HeredHome page
R. A. Haney, M. Dionne, J. Puritz, and D. M. Rand
The Comparative Phylogeography of East Coast Estuarine Fishes in Formerly Glaciated Sites: Persistence versus Recolonization in Cyprinodon variegatus ovinus and Fundulus heteroclitus macrolepidotus
J. Hered., May 1, 2009; 100(3): 284 - 296.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
T. Stadler, B. Haubold, C. Merino, W. Stephan, and P. Pfaffelhuber
The Impact of Sampling Schemes on the Site Frequency Spectrum in Nonequilibrium Subdivided Populations
Genetics, May 1, 2009; 182(1): 205 - 216.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
K. E. Lohmueller, C. D. Bustamante, and A. G. Clark
Methods for Human Demographic Inference Using Haplotype Patterns From Genomewide Single-Nucleotide Polymorphism Data
Genetics, May 1, 2009; 182(1): 217 - 231.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
K. Nadachowska and W. Babik
Divergence in the Face of Gene Flow: The Case of Two Newts (Amphibia: Salamandridae)
Mol. Biol. Evol., April 1, 2009; 26(4): 829 - 841.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
G. Wlasiuk, S. Khan, W. M. Switzer, and M. W. Nachman
A History of Recurrent Positive Selection at the Toll-Like Receptor 5 in Primates
Mol. Biol. Evol., April 1, 2009; 26(4): 937 - 949.
[Abstract] [Full Text] [PDF]


Home page
J HeredHome page
M. Bonhomme, S. Cuartero, A. Blancher, and B. Crouau-roy
Assessing Natural Introgression in 2 Biomedical Model Species, the Rhesus Macaque (Macaca mulatta) and the Long-Tailed Macaque (Macaca fascicularis)
J. Hered., March 1, 2009; 100(2): 158 - 169.
[Abstract] [Full Text] [PDF]


Home page
Proc R Soc BHome page
M. Krosby and S. Rohwer
A 2000 km genetic wake yields evidence for northern glacial refugia and hybrid zone movement in a pair of songbirds
Proc R Soc B, February 22, 2009; 276(1657): 615 - 621.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. Carneiro, N. Ferrand, and M. W. Nachman
Recombination and Speciation: Loci Near Centromeres Are More Differentiated Than Loci Near Telomeres Between Subspecies of the European Rabbit (Oryctolagus cuniculus)
Genetics, February 1, 2009; 181(2): 593 - 606.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. E. Pool and R. Nielsen
Inference of Historical Changes in Migration Rate From the Lengths of Migrant Tracts
Genetics, February 1, 2009; 181(2): 711 - 719.
[Abstract] [Full Text] [PDF]


Home page
Proc R Soc BHome page
C. N Balakrishnan, K. M Sefc, and M. D Sorenson
Incomplete reproductive isolation following host shift in brood parasitic indigobirds
Proc R Soc B, January 22, 2009; 276(1655): 219 - 228.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
C. R. Linnen and B. D. Farrell
Comparison of Methods for Species-Tree Inference in the Sawfly Genus Neodiprion (Hymenoptera: Diprionidae)
Syst Biol, December 1, 2008; 57(6): 876 - 890.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
R. T. Brumfield, L. Liu, D. E. Lum, and S. V. Edwards
Comparison of Species Tree Methods for Reconstructing the Phylogeny of Bearded Manakins (Aves: Pipridae, Manacus) from Multilocus Sequence Data
Syst Biol, October 1, 2008; 57(5): 719 - 731.
[Abstract] [Full Text] [PDF]


Home page
J HeredHome page
G. J. Colbeck, H. L. Gibbs, P. P. Marra, K. Hobson, and M. S. Webster
Phylogeography of a Widespread North American Migratory Songbird (Setophaga ruticilla)
J. Hered., September 1, 2008; 99(5): 453 - 463.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
A. R. Lemmon and E. M. Lemmon
A Likelihood Framework for Estimating Phylogeographic History on a Continuous Landscape
Syst Biol, August 1, 2008; 57(4): 544 - 561.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
T. Slotte, H. Huang, M. Lascoux, and A. Ceplitis
Polyploid Speciation Did Not Confer Instant Reproductive Isolation in Capsella (Brassicaceae)
Mol. Biol. Evol., July 1, 2008; 25(7): 1472 - 1481.
[Abstract] [Full Text] [PDF]


Home page
J HeredHome page
R. L. Hunter and K. M. Halanych
Evaluating Connectivity in the Brooding Brittle Star Astrotoma agassizii across the Drake Passage in the Southern Ocean
J. Hered., March 1, 2008; 99(2): 137 - 148.
[Abstract] [Full Text] [PDF]


Home page
J HeredHome page
B. D. Cook, C. M. Pringle, and J. M. Hughes
Phylogeography of an Island Endemic, the Puerto Rican Freshwater Crab (Epilobocera sinuatifrons)
J. Hered., March 1, 2008; 99(2): 157 - 164.
[Abstract] [Full Text] [PDF]


Home page
J HeredHome page
J. A. Johnson
Recent Range Expansion and Divergence among North American Prairie Grouse
J. Hered., March 1, 2008; 99(2): 165 - 173.
[Abstract] [Full Text] [PDF]


Home page
Proc R Soc BHome page
G. R Bigg, C. W Cunningham, G. Ottersen, G. H Pogson, M. R Wadley, and P. Williamson
Ice-age survival of Atlantic cod: agreement between palaeoecology models and genetics
Proc R Soc B, January 22, 2008; 275(1631): 163 - 173.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
T. Stadler, U. Arunyawat, and W. Stephan
Population Genetics of Speciation in Two Closely Related Wild Tomatoes (Solanum Section Lycopersicon)
Genetics, January 1, 2008; 178(1): 339 - 350.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
N. M. Anthony, M. Johnson-Bawe, K. Jeffery, S. L. Clifford, K. A. Abernethy, C. E. Tutin, S. A. Lahm, L. J. T. White, J. F. Utley, E. J. Wickings, et al.
The role of Pleistocene refugia and rivers in shaping gorilla genetic diversity in central Africa
PNAS, December 18, 2007; 104(51): 20432 - 20436.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
H. B. Shaffer and R. C. Thomson
Delimiting Species in Recent Radiations
Syst Biol, December 1, 2007; 56(6): 896 - 906.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
D. Garrigan, S. B. Kingan, M. M. Pilkington, J. A. Wilder, M. P. Cox, H. Soodyall, B. Strassmann, G. Destro-Bisol, P. de Knijff, A. Novelletto, et al.
Inferring Human Population Sizes, Divergence Times and Rates of Gene Flow From Mitochondrial, X and Y Chromosome Resequencing Data
Genetics, December 1, 2007; 177(4): 2195 - 2207.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
R. Zhou, K. Zeng, W. Wu, X. Chen, Z. Yang, S. Shi, and C.-I Wu
Population Genetics of Speciation in Nonmodel Organisms: I. Ancestral Polymorphism in Mangroves
Mol. Biol. Evol., December 1, 2007; 24(12): 2746 - 2754.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
V. L. Friesen, A. L. Smith, E. Gomez-Diaz, M. Bolton, R. W. Furness, J. Gonzalez-Solis, and L. R. Monteiro
Sympatric speciation by allochrony in a seabird
PNAS, November 20, 2007; 104(47): 18589 - 18594.
[Abstract] [Full Text] [PDF]


Home page
Proc R Soc BHome page
B. Mila, J. E McCormack, G. Castaneda, R. K Wayne, and T. B Smith
Recent postglacial range expansion drives the rapid diversification of a songbird lineage in the genus Junco
Proc R Soc B, November 7, 2007; 274(1626): 2653 - 2660.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
C. Becquet and M. Przeworski
A new approach to estimate parameters of speciation models with application to apes
Genome Res., October 1, 2007; 17(10): 1505 - 1519.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
B. S. Ort and G. H. Pogson
Molecular Population Genetics of the Male and Female Mitochondrial DNA Molecules of the California Sea Mussel, Mytilus californianus
Genetics, October 1, 2007; 177(2): 1087 - 1099.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
S. A. Tishkoff, M. K. Gonder, B. M. Henn, H. Mortensen, A. Knight, C. Gignoux, N. Fernandopulle, G. Lema, T. B. Nyambo, U. Ramakrishnan, et al.
History of Click-Speaking Populations of Africa Inferred from mtDNA and Y Chromosome Genetic Variation
Mol. Biol. Evol., October 1, 2007; 24(10): 2180 - 2195.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. C. Leman, M. K. Uyenoyama, M. Lavine, and Y. Chen
The evolutionary forest algorithm
Bioinformatics, August 1, 2007; 23(15): 1962 - 1968.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
A. R. Hoelzel, J. Hey, M. E. Dahlheim, C. Nicholson, V. Burkanov, and N. Black
Evolution of Population Structure in a Highly Social Top Predator, the Killer Whale
Mol. Biol. Evol., June 1, 2007; 24(6): 1407 - 1415.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Bot.Home page
A. E. Hinkle
Population structure of Pacific Cordyline fruticosa (Laxmanniaceae) with implications for human settlement of Polynesia
Am. J. Botany, May 1, 2007; 94(5): 828 - 839.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
J. M. Waters, D. L. Rowe, S. Apte, T. M. King, G. P. Wallis, L. Anderson, R. J. Norris, D. Craw, and C. P. Burridge
Geological Dates and Molecular Rates: Rapid Divergence of Rivers and Their Biotas
Syst Biol, April 1, 2007; 56(2): 271 - 282.
[Abstract] [Full Text] [PDF]


Home page
J HeredHome page
W. Delport, T. M. Crowe, P. Lloyd, and P. Bloomer
Population Growth Confounds Phylogeographic Inference in Namaqua Sandgrouse
J. Hered., March 1, 2007; 98(2): 158 - 164.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
C. A. Machado, T. S. Haselkorn, and M. A. F. Noor
Evaluation of the Genomic Extent of Effects of Fixed Inversion Differences on Intraspecific Variation and Interspecific Gene Flow in Drosophila pseudoobscura and D. persimilis
Genetics, March 1, 2007; 175(3): 1289 - 1306.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
J. Hey and R. Nielsen
Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics
PNAS, February 20, 2007; 104(8): 2785 - 2790.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
Z. Abdo and G. B. Golding
A Step Toward Barcoding Life: A Model-Based, Decision-Theoretic Method to Assign Genes to Preexisting Species Groups
Syst Biol, February 1, 2007; 56(1): 44 - 56.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
E. H. Stukenbrock, S. Banke, M. Javan-Nikkhah, and B. A. McDonald
Origin and Domestication of the Fungal Wheat Pathogen Mycosphaerella graminicola via Sympatric Speciation
Mol. Biol. Evol., February 1, 2007; 24(2): 398 - 411.
[Abstract] [Full Text] [PDF]


Home page
J HeredHome page
S. V. Vollmer and S. R. Palumbi
Restricted Gene Flow in the Caribbean Staghorn Coral Acropora cervicornis: Implications for the Recovery of Endangered Reefs
J. Hered., January 1, 2007; 98(1): 40 - 50.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. K. Kuhner and L. P. Smith
Comparing Likelihood and Bayesian Coalescent Estimation of Population Parameters
Genetics, January 1, 2007; 175(1): 155 - 165.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
O Thalmann, A Fischer, F Lankester, S Paabo, and L Vigilant
The Complex Evolutionary History of Gorillas: Insights from Genomic Data
Mol. Biol. Evol., January 1, 2007; 24(1): 146 - 158.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. K. Kuhner
Robustness of Coalescent Estimators to Between-Lineage Mutation Rate Variation
Mol. Biol. Evol., December 1, 2006; 23(12): 2355 - 2360.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
P. Kotlik, V. Deffontaine, S. Mascheretti, J. Zima, J. R. Michaux, and J. B. Searle
A northern glacial refugium for bank voles (Clethrionomys glareolus)
PNAS, October 3, 2006; 103(40): 14860 - 14864.
[Abstract] [Full Text] [PDF]


Home page
Proc R Soc BHome page
H.A Lessios and D.R Robertson
Crossing the impassable: genetic connections in 20 reef fishes across the eastern Pacific barrier
Proc R Soc B, September 7, 2006; 273(1598): 2201 - 2208.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
T. R. Buckley, M. Cordeiro, D. C. Marshall, and C. Simon
Differentiating between Hypotheses of Lineage Sorting and Introgression in New Zealand Alpine Cicadas (Maoricicada Dugdale)
Syst Biol, June 1, 2006; 55(3): 411 - 425.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D. L. Aylor, E. W. Price, and I. Carbone
SNAP: Combine and Map modules for multilocus population genetic analysis
Bioinformatics, June 1, 2006; 22(11): 1399 - 1401.
[Abstract] [Full Text] [PDF]


Home page
Proc R Soc BHome page
R. M Zink, A. Pavlova, S. Rohwer, and S. V Drovetski
Barn swallows before barns: population histories and intercontinental colonization
Proc R Soc B, May 22, 2006; 273(1591): 1245 - 1251.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
G. Ewing and A. Rodrigo
Coalescent-Based Estimation of Population Parameters When the Number of Demes Changes Over Time
Mol. Biol. Evol., May 1, 2006; 23(5): 988 - 996.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
S. R. Miller, M. D. Purugganan, and S. E. Curtis
Molecular Population Genetics and Phenotypic Diversification of Two Populations of the Thermophilic Cyanobacterium Mastigocladus laminosus
Appl. Envir. Microbiol., April 1, 2006; 72(4): 2793 - 2800.
[Abstract] [Full Text] [PDF]


Home page
Biol. Bull.Home page
S. B. Johnson, C. R. Young, W. J. Jones, A. Waren, and R. C. Vrijenhoek
Migration, Isolation, and Speciation of Hydrothermal Vent Limpets (Gastropoda; Lepetodrilidae) Across the Blanco Transform Fault
Biol. Bull., April 1, 2006; 210(2): 140 - 157.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
R. Nielsen and M. Matz
Statistical Approaches for DNA Barcoding
Syst Biol, February 1, 2006; 55(1): 162 - 169.
[Full Text] [PDF]


Home page
GeneticsHome page
S. C. Leman, Y. Chen, J. E. Stajich, M. A. F. Noor, and M. K. Uyenoyama
Likelihoods From Summary Statistics: Recent Divergence Between Species
Genetics, November 1, 2005; 171(3): 1419 - 1436.
[Abstract] [Full Text] [PDF]


Home page
Phil Trans R Soc BHome page
M. V Matz and R. Nielsen
A likelihood ratio test for species membership based on DNA sequence data
Phil Trans R Soc B, October 29, 2005; 360(1462): 1969 - 1974.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. E. Ironside and D. A. Filatov
Extreme Population Structure and High Interspecific Divergence of the Silene Y Chromosome
Genetics, October 1, 2005; 171(2): 705 - 713.
[Abstract] [Full Text] [PDF]


Home page
Phil Trans R Soc BHome page
J. Wang
Estimation of effective population sizes from data on genetic markers
Phil Trans R Soc B, July 29, 2005; 360(1459): 1395 - 1409.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
Y.-J. Won, A. Sivasundar, Y. Wang, and J. Hey
On the origin of Lake Malawi cichlid species: A population genetic analysis of divergence
PNAS, May 3, 2005; 102(suppl_1): 6581 - 6586.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
G. Hamilton, M. Currat, N. Ray, G. Heckel, M. Beaumont, and L. Excoffier
Bayesian Estimation of Recent Migration Rates After a Spatial Expansion
Genetics, May 1, 2005; 170(1): 409 - 417.
[Abstract] [Full Text] [PDF]


Home page
Biol. Bull.Home page
J. P. Wares and C. W. Cunningham
Diversification Before the Most Recent Glaciation in Balanus glandula
Biol. Bull., February 1, 2005; 208(1): 60 - 68.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
E. W. Price and I. Carbone
SNAP: workbench management tool for evolutionary population genetic analysis
Bioinformatics, February 1, 2005; 21(3): 402 - 404.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
Y.-J. Won and J. Hey
Divergence Population Genetics of Chimpanzees
Mol. Biol. Evol., February 1, 2005; 22(2): 297 - 307.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
G. Ewing, G. Nicholls, and A. Rodrigo
Using Temporally Spaced Sequences to Simultaneously Estimate Migration Rates, Mutation Rate and Population Sizes in Measurably Evolving Populations
Genetics, December 1, 2004; 168(4): 2407 - 2420.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
B. C. Carstens, A. L. Stevenson, J. D. Degenhardt, and J. Sullivan
Testing Nested Phylogenetic and Phylogeographic Hypotheses in the Plethodon vandykei Species Group
Syst Biol, October 1, 2004; 53(5): 781 - 792.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. Hey and R. Nielsen
Multilocus Methods for Estimating Population Sizes, Migration Rates and Divergence Time, With Applications to the Divergence of Drosophila pseudoobscura and D. persimilis
Genetics, June 1, 2004; 167(2): 747 - 760.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
I. Carbone, Y.-C. Liu, B. I. Hillman, and M. G. Milgroom
Recombination and Migration of Cryphonectria hypovirus 1 as Inferred From Gene Genealogies and the Coalescent
Genetics, April 1, 2004; 166(4): 1611 - 1629.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
S. Fuselli, E. Tarazona-Santos, I. Dupanloup, A. Soto, D. Luiselli, and D. Pettener
Mitochondrial DNA Diversity in South America and the Genetic History of Andean Highlanders
Mol. Biol. Evol., October 1, 2003; 20(10): 1682 - 1691.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
B. Rannala and Z. Yang
Bayes Estimation of Species Divergence Times and Ancestral Population Sizes Using DNA Sequences From Multiple Loci
Genetics, August 1, 2003; 164(4): 1645 - 1656.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. A. Beaumont
Estimation of Population Growth or Decline in Genetically Monitored Populations
Genetics, July 1, 2003; 164(3): 1139 - 1160.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
Estimating Ancestral Population Sizes and Divergence Times
Genetics, January 1, 2003; 163(1): 395 - 404.



Home page
GeneticsHome page
Estimating Effective Population Size and Migration Rates From Genetic Samples Over Space and Time
Genetics, January 1, 2003; 163(1): 429 - 446.



Home page
Mol Biol EvolHome page
M. E. Weale, D. A. Weiss, R. F. Jager, N. Bradman, and M. G. Thomas
Y Chromosome Evidence for Anglo-Saxon Mass Migration
Mol. Biol. Evol., July 1, 2002; 19(7): 1008 - 1021.
[Abstract] [Full Text] [PDF]


Home page
ScienceHome page
S. V. Vollmer and S. R. Palumbi
Hybridization and the Evolution of Reef Coral Diversity
Science, June 14, 2002; 296(5575): 2023 - 2025.
[Abstract] [Full Text] [PDF]