# Bayesian Estimation of Recent Migration Rates After a Spatial Expansion

- Grant Hamilton*,
- Mathias Currat*
^{†}, - Nicolas Ray*
^{†}, - Gerald Heckel*,
- Mark Beaumont
^{‡}and - Laurent Excoffier*,
^{1}

^{*}Computational and Molecular Population Genetics Lab, Zoological Institute, University of Bern, 3012 Bern, Switzerland^{†}Genetics and Biometry Laboratory, Department of Anthropology and Ecology, University of Geneva, 1211 Geneva 24, Switzerland^{‡}School of Animal and Microbial Sciences, University of Reading, Reading RG6 6AJ, United Kingdom

- 1
*Corresponding author:*Computational and Molecular Population Genetics Lab, Zoological Institute, University of Bern, Baltzerstrasse 6, 3012 Bern, Switzerland. E-mail: laurent.excoffier{at}zoo.unibe.ch

## Abstract

Approximate Bayesian computation (ABC) is a highly flexible technique that allows the estimation of parameters under demographic models that are too complex to be handled by full-likelihood methods. We assess the utility of this method to estimate the parameters of range expansion in a two-dimensional stepping-stone model, using samples from either a single deme or multiple demes. A minor modification to the ABC procedure is introduced, which leads to an improvement in the accuracy of estimation. The method is then used to estimate the expansion time and migration rates for five natural common vole populations in Switzerland typed for a sex-linked marker and a nuclear marker. Estimates based on both markers suggest that expansion occurred <10,000 years ago, after the most recent glaciation, and that migration rates are strongly male biased.

MAKING quantitative inferences on molecular data in complex demographic settings remains an ongoing challenge. Traditionally, inferences have been made using summary statistics under simplified models (*e.g.*, Fu and Chakraborty 1998). While sometimes useful for qualitative comparisons, these simple models do not adequately reflect the complexity of processes that might affect molecular genetic diversity. Recent advances in maximum-likelihood and Bayesian approaches have shown that it is possible to make full use of the data gathered from population samples (*e.g.*, Beaumont 1999; Beerli and Felsenstein 2001; Nielsen and Wakely 2001; Wang and Whitlock 2003). Although these methods have the potential to be accurate, the calculation of likelihoods under complex models can be problematic, necessitating the use of simplified demographic models. A promising alternative approach is to compare summary statistics that are calculated from observed data and related to the parameter(s) of interest, with summary statistics simulated under a model for which the parameters are known (*e.g.*, Fu and Li 1997; Tavaré *et al*. 1997; Pritchard *et al*. 1999; Estoup *et al*. 2001, 2004; Estoup and Clegg 2003). An appealing feature of these approximate Bayesian computational (ABC) methods is that models of any complexity can be used, provided only that data can be simulated under the model (Beaumont *et al*. 2002). This allows for the use of models that more closely reflect the complexity of real processes, potentially allowing the estimation of more meaningful biological parameters.

Many species have had a complex history that has included a spatial expansion from a restricted range (*e.g.*, an expansionary period following an ice age), with the establishment of new demes and the exchange of genes among those demes (Hewitt 2000; Ray *et al*. 2003; Excoffier 2004). Although these expansion processes affect various aspects of molecular diversity differently and can thus be described using a combination of summary statistics, making joint estimates of expansion parameters in an explicitly spatial setting would be difficult using either full-likelihood or more conventional methods. However, it may be possible to use an ABC approach to simultaneously infer spatial expansion parameters in the context of an appropriate spatial model. Additionally, data from differently inherited molecular markers can be used to investigate interesting biological phenomena such as differences in dispersal rates between the sexes after a range expansion.

The purpose of this study is to show that an approximate Bayesian approach can be used to accurately infer parameters of spatial expansion in a two-dimensional stepping-stone (2DSS) model, using both mtDNA sequences and short tandem repeats (STRs) as molecular markers. We test the method using a range of parameter combinations and also show that sampling from several demes rather than from a single deme improves estimation accuracy. Finally, we challenge the method with empirical data from natural populations of the common vole (*Microtus arvalis*) to assess differences between male and female dispersal levels and to determine the time since the most recent range expansion.

## MATERIALS AND METHODS

### Estimation procedure and simulation model:

We follow the ABC approach described formally by Beaumont *et al*. (2002) and briefly below. The method relies on the simulation of large numbers of data sets using known parameters under a given model. Summary statistics are calculated for each data set. These simulated statistics are then compared with summary statistics calculated from the observed sample. Simulated summary statistics that are “close” enough (*i.e.*, fall within a predetermined distance, δ) to the observed summary statistics are retained, with their associated parameters. All other data sets are rejected. After a smooth weighting and local regression adjustment step, the accepted parameters form an approximate posterior distribution that can be used to estimate the parameter of interest.

### Modification to conventional ABC:

One aspect of simultaneously estimating several parameters with the ABC method is that while a statistic may estimate a given parameter well, it may be a poor estimator for other parameters. This problem has the potential to lead to a decrease in the accuracy of estimation. For instance, while variance in the size of STR alleles evolving under a stepwise mutation model shows a strong relationship with the scaled expansion time (*τ*) (Goldstein *et al*. 1999), it carries little information on gene flow. Conversely, it would be expected that the scaled migration rate *M* = 2*Nm* (where *N* is the number of genes present in a local deme and *m* is the immigration rate from neighboring demes) would show a stronger relationship with *F*_{ST} than, for example, the mean number of pairwise differences among mtDNA sequences (Excoffier 2004). Thus while statistics are chosen because they describe well one aspect of a range-expansion phenomenon, they will usually be less informative on other parameters. We address this issue by using a weighting scheme such that statistics carrying more information on the parameter of interest are given a greater weight. Beaumont *et al.* (2002) defined the distance δ using a Euclidean metric to give circular acceptance regions, accepting a proportion *P*δ of the simulation data (the tolerance). To aid estimation efficiency, we use a weighted Euclidean distance (WED). We assess the relationship between each parameter-statistic pair during a preliminary analysis step, in which a local regression is run in the vicinity of each observed summary statistic. The square of the correlation coefficient between a parameter and a statistic (*R*^{2}) from each local regression was taken as a simple measure of the utility of the statistic to act as an estimator of the parameter in that region of the distribution.

For illustration, consider just two statistics, *S*_{1} and *S*_{2}, that are believed to be informative for the parameter Φ_{1}, with values *s*_{1} and *s*_{2} calculated for an observed data set (this procedure could of course be extended to more than two statistics). To calculate the weight for *S*_{1} for this observed data set, simulated statistics that are in the vicinity of *s*_{1} (for example, the 1% of the empirical distribution closest to *s*_{1}) are collected, together with their associated parameters. The retained statistics are regressed on the retained parameters, allowing the weight to be calculated as 1where *R*^{2}_{ij} is the determination coefficient of the *i*th parameter by the *j*th statistic. After the operation is repeated for S_{2}, the weights are scaled so as to sum to unity as . The weighted Euclidean distance δ* _{i}* between the “observed” values

*s*

_{1}and

*s*

_{2}and the simulated values of the same statistics

*s*

^{′}

_{1}and

*s*

^{′}

_{2}can then be found simply as 2where the index

*i*emphasizes the fact that the Euclidean distances between simulated and observed statistics will differ depending on the parameter to be estimated. Note that a weighting scheme using simply

*R*

^{2}was also investigated, but did not lead to drastic differences to the present weighting scheme. Compared to

*R*

^{2}, an advantage of the weighting in Equation 1 is that it will enhance the difference in weight between parameter-statistic pairs with a strong relationship (

*R*

^{2}> 0.8) and those with a poor relationship, ensuring that most weight is placed on statistics with large determination coefficients. After the weighting step, each parameter is independently estimated in a way similar to that of the standard ABC multivariate local regression approach. Apart from the use of a WED, our estimation procedure differs from that of Beaumont

*et al*. (2002) only in that we separate the time-consuming simulation stage from the estimation procedure. The modified algorithm is: (1) select prior distributions for parameters Φ

_{1}and Φ

_{2}; (2) draw values Φ

^{′}

_{1}and Φ

^{′}

_{2}for Φ

_{1}and Φ

_{2}from the appropriate priors; (3) use Φ

^{′}

_{1}and Φ

^{′}

_{2}to simulate genetic diversity under the chosen model to produce a single data set, using the same genetic marker, number of samples, and number of loci as the observed data; (4) calculate the summary statistics values

*s*

^{′}

_{1}and

*s*

^{′}

_{2}for the data set; (5) repeat steps 2–4 until the desired number of simulations has been completed; (6) compute summary statistics

*s*

_{1}and

*s*

_{2}on observed data; (7) choose a tolerance level

*P*

_{δ}to determine the number of data points retained for posterior density estimation; (8) for

*S*

_{1}, retain a small proportion (

*e.g.*, 1%) of the

*s*

^{′}

_{1}that are closest to

*s*

_{1}together with their associated parameters Φ

^{′}

_{1}; (9) regress the retained

*s*

^{′}

_{1}on Φ

^{′}

_{1}and obtain the

*R*

^{2}value; (10) repeat steps 8 and 9 for

*S*

_{2}and calculate weights

*w*

^{*}

_{Φ1S1}and

*w*

^{*}

_{Φ1S2}using Equation 1 and scaling described above; (11) calculate the WEDs between each simulated data set using

*w*

^{*}

_{Φ1S1}and

*w*

^{*}

_{Φ1S2}and the observed data as in Equation 2, and retain a fraction

*P*

_{δ}of the data sets (summary statistics with associated parameters) that are closest to the observed data; (12) perform multiple regression of the retained statistics on the parameters and adjust posterior density as described in Beaumont

*et al*. (2002); and (13) repeat steps 8–12 for Φ

_{2}.

The difference in analysis time between the simultaneous estimation of parameters under the standard ABC method and the extended method described above in which parameters are estimated in succession is small, being in the order of seconds to several minutes for a single set of observed data, depending on the size of the simulation file. Also the time necessary for computing the weights is negligible compared to the time required for the simulations and the rest of the estimation procedure. A comparison between the standard ABC method and the extended method using a WED was conducted using the demographic model, summary statistics, and priors that are described below.

### Simulations:

The simulation model is a modification of the model described by Ray *et al*. (2003). A subdivided population was simulated on a torus of 50-by-50 demes with an expansion from an originating deme arbitrarily located at 〈25, 25〉. Ray *et al*. (2003) simulated a forward demographic expansion in which colonization occurs as a wave from the originating deme, with an ongoing exchange of migrants among demes that have been colonized [range expansion (RE) model]. Our simulation model differs from the RE model, making the common assumption of an instantaneous expansion, such that each of the 2500 demes is filled instantly to a given size *N* (note that *N* represents here the number of genes present in a deme) identical for all demes [instantaneous expansion (IE) model]. This slight modification allowed for much faster simulations. Under this model, neighboring demes exchange genes at a rate *m* during the *T* generations following the expansion and a standard backward coalescent process is implemented to simulate gene genealogies. During this backward process, gene lineages can either migrate between different demes or coalesce if they are in the same deme. Going backward in time, the instantaneous expansion corresponds to an instantaneous contraction, where all gene lineages are instantaneously brought back to the originating deme, where further coalescent events can occur until only one lineage remains. Genetic data are obtained by adding mutations at rate μ under a strict stepwise mutation model for STRs and a finite sites model without transition bias for sequences. Parameters describing the scaled expansion time τ = 2*T*μ, the scaled population size θ = 2*N*μ, and the scaled migration rate *M* = 2*Nm* are thus known and are recorded for each simulation. We note here the advantage of separating the simulation model from the estimation procedure. Since the accuracy of estimation depends in part on the number of simulations used (the simulation size), where possible it is desirable to use several hundreds of thousands to millions of simulations. Depending on the complexity of the underlying demographic model, it therefore takes much longer to generate the simulation file than to run the ABC estimation procedure. During the exploration phase of research, the separation of these two steps saves considerable time by allowing multiple analyses to be run using a single simulation file.

### Samples and summary statistics:

Samples were drawn from three demes on the torus, which were located arbitrarily at 〈10, 40〉, 〈30, 30〉, and 〈40, 15〉 (identified hereafter as demes 1, 2, and 3, respectively). There were at least 10 demes between each of the sampling demes. Fifty genes were drawn from each sampled deme, consisting of either DNA sequences (300 bp) or nuclear STRs (10 independent loci), which were used to calculate summary statistics. For mtDNA, four within-deme summary statistics were calculated for each sampled deme: number of haplotypes, *k*; homozygosity, *H*_{o}; number of segregating sites, *S*; and the average number of pairwise differences, π. For STR data, three within-deme summary statistics were calculated for each sampled deme: mean number of alleles per locus, *a*; homozygosity, *H*_{o}; and mean variance (across loci) in allele repeat number. The fixation index, *F*_{ST}, was calculated among demes for both markers whenever more than one deme was sampled. These statistics were chosen for their ability to estimate parameters on the basis of preliminary investigations and their known dependency on the values of the parameters of a range expansion under the infinite island model (Excoffier 2004).

### Parameter estimation:

For mtDNA, simulations were sampled from prior distributions of τ as a uniform distribution between 0 and 50, θ as a log-uniform distribution between 0.01 and 10, and *M* as a log-uniform distribution between 0.01 and 500. For STR loci, simulations were sampled from prior distributions of τ as a uniform distribution between 0 and 300, θ as a log-uniform distribution between 0.01 and 200, and *M* as a log-uniform distribution between 0.01 and 500. A flat uniform distribution was used for τ because there is often no prior expectation for the time of the range expansion. However, for *M* and θ, we used log-uniform prior distributions to have equal coverage (and potentially equal accuracy of estimates) for small (<1), intermediate (>1 and <10), and large (>10) values of the parameters. Unless otherwise stated, 1,000,000 values of the summary statistics were generated and a tolerance *P*_{δ} = 0.001 was used to give 1000 points from which parameters were estimated. As suggested by Beaumont *et al*. (2002), a log transformation was applied to the retained, simulated parameters before the regression adjustment, and parameter estimates were based on the back-transformed values. We also follow Beaumont *et al*. (2002)(Equation 7) in using the fitted value of the regression line as a point estimate of the parameter. During exploratory analysis of a small subset of results, means, medians, and modes of the posterior distribution were calculated as alternative estimators, and only minor differences in the value of parameter estimates were found.

## RESULTS

### Simulation studies—modified vs. standard ABC:

The accuracy of estimation under the ABC procedure detailed by Beaumont *et al*. (2002) depends on several factors, including the simulation size, the number of parameters to estimate, and the tolerance. For a given simulation size, increasing the number of parameters requires an increase in the tolerance. To determine if using a WED increased estimation efficiency across a range of tolerance values, compared with a standard ABC approach under the 2DSS model, the two methods were assessed as follows. An observation set of sequences was generated under parameters τ = 10, θ = 5, and *M* = 10. Here, and in subsequent evaluations, each observation set consisted of 1000 observations. The parameter chosen for evaluation (*M*) was estimated for each of the data sets across a range of tolerance levels using both the standard ABC approach (Beaumont *et al*. 2002) and an ABC procedure using the WED modification. The relative mean square error (RMSE) was used to assess both bias and accuracy. The WED modification improves estimation efficiency, particularly as the tolerance increases (Figure 1). A similar result was found for STR loci (data not shown). Similar trends were found for the estimation of τ and θ for both markers, although the magnitude of the improvements was smaller (data not shown). In the limit of increasing numbers of simulated data sets, the advantage gained by estimating parameters separately and using a WED decreases until the estimates converge. Indeed, this was previously shown by Beaumont *et al*. (2002) when comparing results of the standard ABC method with an earlier rejection algorithm. Although 1 million simulations is adequate for the estimation of three parameters in the model presented here, the capacity to use such a large number of simulations will depend to some extent on the complexity of the underlying model and hence simulation times. For more complex models it may be necessary to estimate parameters using fewer simulations. Previous investigations have shown that with fewer (50,000) simulations under the present model, the trend is similar to that shown in Figure 1 as tolerance increased, but the WED approach is considerably more accurate than the standard method. For example, using the mtDNA observation set used above, and under identical conditions except for a simulation size of 50,000, the RMSE for *M* using P_{δ} = 0.01 for the standard ABC approach is 0.45 (±0.028 SE), which is substantially greater than that from the WED modification (RMSE = 0.33 ± 0.018 SE). The greatest benefit of using the WED approach thus accrues when the number of simulations is small with respect to the number of parameters to be estimated. All further estimation in this study was conducted using the WED modification.

The IE model was used to generate observation sets under a range of known parameter values of τ, θ, and *M*. Analyses were conducted using samples from a single deme (deme 1) or from the three demes described above. For each parameter combination, the behavior of the estimator was assessed using the mean and the 2.5 and 97.5 quantiles of the distribution of the 1000 estimates, as well as the number of times the 95% credible intervals of the posterior distributions contained the true parameter value (coverage property).

### Parameter estimation from a single deme:

When samples are drawn from a single deme, *F*_{ST} cannot be calculated, and it is therefore not used as a summary statistic. As reported in Table 1a for simulated DNA sequence data, there is a substantial bias in estimates of τ when expansion time is short and the scaled migration rate *M* is low to moderate. Estimates are more accurate under short expansion times when *M* increases, and the range of estimates decreases as shown by the 2.5 and 97.5 quantiles. Migration rate is underestimated when expansion time is short and population size (θ) is small. Again, τ and *M* are estimated quite well when migration is large, and while the range of estimates for τ is good, the 2.5 and 97.5 quantiles show that the distribution of estimates for *M* is still very broad. Population size estimates were good across the set of parameter combinations with an acceptable range. The coverage shows that reconstructed 95% credible intervals were conservative for most combinations of parameters. Parameter estimates using STR data are better than those obtained using single-locus DNA sequences, as would be expected when using multiple loci for estimates (Table 1b). Although τ is often estimated well, there is again a tendency to overestimation when migration is low to moderate, which is most marked when population sizes are small. Migration estimates are good when *M* is very small, with an underestimation of the parameter as *M* increases. The distribution of estimates is, however, broad. Population size is usually underestimated, although when expansion time is long and *M* is large, this parameter is overestimated. Coverage study shows that the credible intervals tend to be conservative across most parameter combinations.

### Parameter estimation from three demes:

There are substantial gains in accuracy across a wide range of parameter value combinations when samples are drawn from three demes rather than from a single deme (Table 2). For mtDNA sequences, expansion times are consistently estimated well and the range of estimates is relatively narrow (Table 2a). The reduced coverage also implies that credible intervals of the posterior distributions are narrower than that for 1-deme estimate while still encompassing the true parameter value in ∼95% of cases. Migration estimates are also very accurate across the range of parameter combinations simulated. The range of estimates for *M* is relatively narrow and credible intervals show good coverage properties of the values under which the data sets were generated. Similarly, θ estimates are good with conservative credible intervals. Parameters are also estimated well when using STR data (Table 2b). Expansion time estimates are consistently accurate, with a narrow range of estimates. *M* estimates are now consistently good across the range of parameters simulated, although with a tendency toward slight underestimation when *M* is moderate. The 2.5 and 97.5 quantiles show that the range of estimates of *M* is acceptable, and credible intervals of the posterior distribution show good coverage of the true migration value with little substantial deviation from expectation. Population size is also estimated well and does not show the tendency toward underestimation observed when data are sampled from a single deme. However, the range of estimates of θ is broad for some parameter combinations when compared with the estimate range of other parameters, and credible intervals are conservative.

### Application to the estimation of sex-biased dispersal in the common vole (*M. arvalis*):

The present distributional range of *M. arvalis* spans most of Europe and the western parts of Asia (Mitchell-Jones *et al.* 1999) and Switzerland in the center of the Alps has been recently colonized after the melting of the ice cover ∼10,000 years ago. The mode of colonization and dispersal in small rodents suggests that the simple 2DSS world is an appropriate spatial model since physical restrictions and social association in subterranean burrows effectively limit long-range dispersal. Dispersal is generally biased toward males in Microtus like in the majority of mammals (Clutton-Brock 1989; Aars and Ims 2000; Clobert *et al*. 2001), but its effectiveness and the extent of differences between the sexes are unknown since no genetic studies have been conducted. We address these questions for *M. arvalis* by analyzing a set of five populations (124 individuals) throughout Switzerland for which data from 12 STR loci were typed and a 321-bp fragment of the hypervariable region II of the mitochondrial control region was available (G. Heckel, R. Burri, S. Fink, J.-F. Desmet and L. Excoffier, unpublished results). The five Swiss samples have been chosen because they present the same mtDNA (“central”) cytb lineage, which is found in southern Germany and northern Switzerland. We therefore assumed that the five Swiss samples belonged to the same expansion wave that has colonized Switzerland from the North after the last glaciation (Fink *et al.* 2004). Simulations were conducted in the 2DSS model described previously. The five sampling demes were spread randomly across the homogenous world, subject to the constraint that at least five nonsampled demes lay between each of the sampling demes. Although sampling demes were assigned to fixed positions during simulations, previous investigations have shown that the position of sampling sites has no discernible effect on estimated parameter values. For this, we created 12 simulation files (four replicates, with each replicate consisting of 3 simulation files). Within each replicate, sampling deme locations were fixed at randomly chosen positions; however, these positions differed among replicates. All simulation files were challenged with the same observed data and estimates of τ and *M* were made to assess whether the variation in estimates among replicates (due to sampling deme position) was greater than variation within replicates. No such difference was found (data not shown).

One million simulations were conducted for each type of molecular marker. Summary statistics for mtDNA and STRs were identical to those used in previous evaluation simulations. For mtDNA, simulations were sampled from prior distributions of τ-uniform [0:20], θ-log uniform [0.01:20], and *M*-log uniform [0.01:500]. For STRs, simulations were sampled from prior distributions of τ-uniform [0:200], θ-log uniform [0.01:100], and *M*-log uniform [0.01:500]. The ABC procedure was conducted as described previously. A tolerance of 0.001 was used for each estimation procedure to give 1000 points for the calculation of parameter estimates and the 95% credible intervals of posterior distributions. The inferred scaled expansion time for STRs is 23.0 (Table 3). Using a STR mutation rate of 5 × 10^{−4} (Jarne and Lagoda 1996; Ellegren 2004) and the reasonable assumption of three generations per year (Hausser 1995), the time of the most recent range expansion was calculated as ∼7667 years (with a 95% credible interval of 5455 to 11,337 years). For mtDNA, τ was estimated as 4.8 (Table 3). For the stretch of the control region used in this study, a mutation rate of 46% per site per million years of divergence was estimated by comparison with a related species with known divergence time (Fink *et al.* 2004), although this estimate may be conservative due to the potential for multiple hits per site. By again assuming three generations per year, the inferred expansion time for mtDNA is ∼8127 years (with a 95% credible interval of 4725–13,605 years), showing that the two types of markers are in excellent agreement in showing a postglacial colonization time of Switzerland. As expected, estimation of the *M* parameter differs significantly between mtDNA and STR data (Table 3), with *M*_{STR} = 13.2 (95% credible interval [5.6, 20.4]) for nuclear genes and *M*_{mt} = 0.29 [0.02, 2.5] for mtDNA. The difference in ploidy and transmission pattern between mtDNA and STR markers implies that *M*_{STR} = 4*N*_{e}*m*, where *N*_{e} is the effective population size and *N*_{e}*m* is the effective number of nuclear genes (3.3) exchanged between neighboring demes per generation, while *M*_{mt} = 2*N*_{f}*m*, where *N*_{f} is the number of females and *N*_{f}*m* is the number of female genes (0.145) exchanged between demes per generation. The posterior distribution of the *Nm* values for female, male, and total immigrant genes is shown in Figure 2. The male posterior distribution was obtained from the convolution of the total and female densities, under the simple assumption that *N*_{e} = *N*_{m} + *N*_{f} . The inferred male density has a mode at *Nm* = 3.01 and limits of an equal-tail 95% credible interval at 1.31 and 5.2. Since the ratio of point estimates for male and female *Nm* values is 20.76, it suggests that males move ∼20 times more than females, in agreement with an extreme philopatry of females.

## DISCUSSION

The results presented here show that ABC can be used to accurately infer parameters under a spatially explicit model using mtDNA sequences or STRs. The use of several summary statistics, each of which contains different information on the expansion process, allows for the simultaneous and accurate estimation of the time since the range expansion, the scaled population size, and the number of migrant genes exchanged between neighboring demes. Furthermore, the introduction of a weighted Euclidean distance giving greater weight to statistics that carry more information on a given parameter is shown to improve the accuracy of the estimation procedure, especially when the number of simulations is limited (Figure 1).

The ABC procedure is robust and estimates parameters consistently well across a broad range of parameter values in the two-dimensional stepping-stone model. A least-squares estimation procedure was proposed earlier to estimate the parameters of a range expansion from mismatch distributions computed from sequence data under an infinite island model (Excoffier 2004). Simulations have shown that this method provides reasonable estimates of τ, but it was found to be highly unreliable for estimating the migration parameter *M* (data not shown), because the mismatch distribution has a high associated variance for low *M*-values and is uninformative for *M* when this parameter is large (*M* > 50) (Excoffier 2004). The ability to use other aspects of molecular diversity, both within and between demes, considerably improves the estimation of the migration parameter, such that reliable estimates of sex-specific dispersal are possible.

As shown above, the accuracy of the estimation increased when more demes were sampled. While sampling from a single deme provided satisfactory estimates for some parameter combinations, it resulted in poor estimates for other combinations, particularly when expansion times were short and migration was low. The improvement due to the use of several demes is linked to the additional information available, but also to the fact that when the total number of demes in the population is large, the gene genealogies of different samples are almost independent after the expansion, thus providing replicate information on the migration process between neighboring demes. Moreover, information on the amount of differences between demes can be incorporated, for instance, by using the *F*_{ST} statistic, while only within-deme diversity can be used with a single deme. It should be noted, however, that using information from several demes to estimate a migration parameter assumes that it is a parameter tied to the biology of the species rather than being location specific and postulates that dispersal patterns will be equivalent at different locations. In that sense the different samples can be considered as replicates of the same process. This assumption is likely to be reasonable when samples are drawn from similar environments. Additionally, the use of several demes in a single estimation procedure assumes that the demes belong to the same expansion wave. Obviously, our results could be erroneous if a species had occupied its present range by a series of independent range expansions (*i.e*., from different refuge areas) and if sampled demes were drawn from regions colonized from different sources.

An attractive feature of the ABC method is its ability to handle different types of molecular markers easily, such as DNA sequences or microsatellites, but SNP data could also be used if possible sources of ascertainment bias could be simulated (*e.g.*, Wakeley *et al*. 2001). One particularly useful application is therefore to compare parameter estimates obtained from different markers, to assess either similarities or any discrepancies that may carry valuable biological information. There is a remarkable concordance in expansion time estimates inferred from mtDNA and STRs for the common vole in Switzerland. These estimates are highly plausible since this suggests the onset of the range expansion occurred after most of the ice cap covering the region during the last glacial maximum had melted by 12,000 years before present and the glaciers had retreated to the highlands (Hewitt 1999). Moreover, from the relatively extensive simulation studies on the performance of our methodology reported in Tables 1 and 2, we see that the coverage property of the posterior distributions is conservative with 1 million simulations and a tolerance level of 0.1%, so that we are confident that the estimations obtained from the vole data using the same number of simulations and tolerance level are valid. The existence of glaciation data external to the genetic data is extremely useful to further validate the expansion time estimates. Therefore, the good agreement between these independent data sources suggests that the ABC method may work well when such corroborative data are not available. In contrast to the similarity in estimated expansion times for Northern Swiss vole mtDNA sequences and STRs, migration estimates showed an extreme sex bias in dispersal, with males dispersing at ∼20 times the rate of females. This extreme difference in the immigration rate between the sexes was unexpected, although the direction of the bias is completely in agreement with observational data (Clutton-Brock 1989; Aars and Ims 2000; Clobert *et al.* 2001; G. Heckel, unpublished data). Note here that our estimates reflect the effective number of immigrants received by each deme from surrounding unsampled subpopulations and not the number of genes exchanged between sampled demes, which is in clear contrast with other methods aiming at estimating migration rates (*e.g.*, Beerli and Felsenstein 2001). However, since the 2DSS model is an approximation of a more continuous distribution of individuals (Barton and Wilson 1995), the exact delineation of a deme and its neighborhood size should depend on the pattern of the dispersal distance of individuals, which may not be identical for males and females. In that case, because we expect a quadratic relationship between the average dispersal distance and the deme size, the number of immigrants (*Nm*) estimated for males and females may not necessarily apply to the same spatial scale. The 20-fold larger rate of immigration for males compared to female voles could thus be an overestimation if the males were dispersing over larger distances than the females. We nevertheless show here how information on nuclear and mtDNA markers can be combined to get estimates of male dispersal, without the need for male-specific markers.

While our study suggests that the ABC procedure is an efficient means of estimating parameters in the 2DSS model, more refined spatial models could be considered, which could include a number of realistic features such as coastlines or environmental heterogeneity that are likely to be important in shaping the molecular diversity of expanding populations. A simulation model that can include these factors as well as a more progressive range expansion has been recently made available (SPLATCHE, Currat *et al.* 2004). Because simulations under this model are considerably slower than those reported here, its coupling to the ABC procedure is currently difficult, but should be possible with forthcoming increases in computing power. Since the accuracy of the ABC procedure depends in part on the number of simulations used (and hence on simulation time), one interesting challenge for the future will be to determine the benefits of increased model complexity and realism, *vs.* simulation size. Compared to likelihood approaches, the ABC methodology is much easier to implement for complex demographic models, but it does not use as much data and should thus not be expected to be as accurate. A potential difficulty of the ABC approaches is in the choice of summary statistics and in the definition of the appropriate tolerance level, which are difficult to assess *a priori*. Therefore, simulation studies on the performance of the methodology should be conducted before applying it to observed data, to check for potential biases and for good coverage properties of posterior distributions. Overall, however, this study shows that the ABC procedure should provide a valuable and flexible tool for investigating questions involving range expansions, including those related to sex-biased dispersal or to the history of human settlement.

## Acknowledgments

We are grateful to Pierre Berthier and Samuel Neuenschwander for computing assistance, to Sabine Fink and Reto Burri for assistance in the laboratory, and to Arnaud Estoup for helpful comments on the manuscript. This work was supported by a Swiss National Science Foundation grant no. 3100A0-100800 to L.E.

## Footnotes

Communicating editor: K. W. Broman

- Received August 1, 2004.
- Accepted January 21, 2005.

- Genetics Society of America