## Abstract

Approximate Bayesian computation (ABC) techniques permit inferences in complex demographic models, but are computationally inefficient. A Markov chain Monte Carlo (MCMC) approach has been proposed (Marjoram *et al.* 2003), but it suffers from computational problems and poor mixing. We propose several methodological developments to overcome the shortcomings of this MCMC approach and hence realize substantial computational advances over standard ABC. The principal idea is to relax the tolerance within MCMC to permit good mixing, but retain a good approximation to the posterior by a combination of subsampling the output and regression adjustment. We also propose to use a partial least-squares (PLS) transformation to choose informative statistics. The accuracy of our approach is examined in the case of the divergence of two populations with and without migration. In that case, our ABC–MCMC approach needs considerably lower computation time to reach the same accuracy than conventional ABC. We then apply our method to a more complex case with the estimation of divergence times and migration rates between three African populations.

WITH the advent of large-scale genotyping techniques (*e.g.*, Green *et al.* 2006; Levy *et al.* 2007), genetic data can be produced at an unprecedented scale (Altshuler *et al.* 2005; Bustamante *et al.* 2005), and the genetic variability of individuals and populations can now routinely be examined at hundreds of loci across the genome (Rosenberg *et al.* 2002; Williamson *et al.* 2005; Becquet and Przeworski 2007; Frazer *et al.* 2007). These large data sets offer the hope to better understand the evolutionary forces that have shaped the diversity of many species, including humans, and to identify genome regions involved in past selective events (Anisimova and Liberles 2007; Nielsen *et al.* 2007). However, the demographic history of the populations needs to be accounted for to disentangle its effects from those of selection (Haddrill *et al.* 2005; Nielsen *et al.* 2005; Biswas and Akey 2006). It therefore seems important to be able to properly estimate this past demography from neutral genetic data or to estimate demography and selection simultaneously (*e.g.*, Williamson *et al.* 2005). The statistical estimation of mutation and demographic parameters has drastically improved in the last 10 years, particularly with the use of Bayesian and full-likelihood approaches (Beaumont *et al.* 2002; Marjoram and Tavare 2006). However, these methods are still restricted to relatively simple models whose likelihood can be computed or to small data sets that can be analyzed in a reasonable amount of time. The handling of large data sets and the estimation of demographic parameters under realistic models remain problematic, and goodness-of-fit methods have been often used in those cases (see, *e.g.*, Marth *et al.* 2004; Schaffner *et al.* 2005; Plagnol and Wall 2006).

The approximate Bayesian computation (ABC) framework (Tavare *et al.* 1997; Pritchard *et al.* 1999; Beaumont *et al.* 2002), which is based on a simple rejection algorithm, has been applied to the estimation of demographic parameters in a variety of evolutionary models in nonmodel organisms and in humans (Estoup *et al.* 2004; Tallmon *et al.* 2004; Excoffier *et al.* 2005a; Hickerson *et al.* 2006; Fagundes *et al.* 2007; Rosenblum *et al.* 2007). The generic principle first outlined in Tavare *et al.* (1997) is to simulate data (*D*′) similar to observations (*D*) for sample size and number of loci under a given model, with parameters (**θ**) drawn from some prior distributions. If *D*′ is identical to *D*, the parameters are stored, and discarded otherwise, and the retained parameters are used to estimate the posterior distribution. Since it is very unlikely to simulate *D*′ identical to *D* for large data sets or complex models (Marjoram and Tavare 2006), it has been proposed (Pritchard *et al.* 1999) to replace data by a set of summary statistics **S** and to retain a particular simulation if the simulated summary statistics **S′** are sufficiently close to the observed summary statistics **S**. To account for the difference between **S** and **S′**, Beaumont *et al*. (2002) recently proposed to perform a locally weighted linear regression to compute the posterior distribution. This regression adjustment step was shown to lead to much improved estimations. However, even with this improvement the ABC methodology is computationally not very efficient, as it requires the simulation of millions of samples, a large majority of which, typically ≥99%, will be discarded for parameter estimation.

More recently, Marjoram *et al.* (2003) proposed another likelihood-free approach where simulations are directly embedded within a Markov chain Monte Carlo (MCMC) framework. They showed that a Markov chain where newly simulated data *D*′ would be accepted if they were equal to the observed data *D*, and rejected otherwise, would converge to the right posterior distribution . For complex data sets where the acceptance rate is too small, they proposed to replace the full data by summary statistics and to accept new parameter values if sufficiently close to the data, like in conventional ABC. A problem linked with this approach is to define how close simulations need to be to accept them, which conditions (i) the acceptance rate and (ii) the mixing and the convergence of the chain, but also (iii) the burn-in period, since it may require a very large number of simulations to have the first accepted step if the starting point is in a region with low likelihood. Other likelihood-free approaches, like sequential Monte Carlo (Sisson *et al.* 2007) or population Monte Carlo (M. A. Beaumont, J.-M. Cornuet, J.-M. Marin and C. P. Robert, unpublished results), have been recently proposed to address the problem of slow convergence and to more efficiently explore multimodal posteriors. However, these likelihood-free approaches have only rarely been tested for complex evolutionary models (but see Ratmann *et al.* 2007 for a comparison of evolutionary dynamics of protein networks).

In this article, we present a new approach, borrowing the best features of both conventional ABC based on a rejection algorithm and MCMC without likelihood, which deals efficiently with all the problems mentioned above. We test our approach in the case of different models of population isolation and migration introduced by Nielsen and Wakeley (2001), but do not attempt to compare our methodology with full-likelihood approaches. Our new methodology is finally applied to a model of population isolation and migration for three African populations.

## METHODS

We begin by describing Marjoram *et al.*'s (2003) likelihood-free MCMC algorithm based on summary statistics (SS), hereafter called SS–MCMC, showing its underlying problems. We then propose solutions borrowed from conventional ABC, which lead to a new algorithm called ABC–MCMC.

#### SS–MCMC algorithm:

Given some observed data *D* generated under a given model *M* defined by a set of (unknown) parameters **θ** with prior distribution , Marjoram *et al.* (2003) have shown that the posterior distribution could be obtained from samples of a Markov (M) chain without likelihood generated by the following algorithm:

M1. Propose a move from current state

**θ**to a new state**θ′**according to a transition kernel .M2. Simulate data

*D*′ under model*M*, using the new parameters**θ′**.M3. If

*D*′ =*D*, go to step M4; otherwise remain at state**θ**and go back to step M1.M4. Accept state

**θ′**with probability ; otherwise remain at state**θ**. Go to step M1.

Since it is very unlikely to simulate *D*′ identical to *D* for large data sets or complex models (Marjoram and Tavare 2006), Marjoram *et al.* (2003) proposed to replace the data *D* by a set of sufficient summary statistics **S** and the condition *D*′ = *D* by the less stringent condition in step M3, where is an arbitrarily small distance between **S**′ and **S**. Note that if the summary statistics are not sufficient, that is, the statistics do not capture the full information contained in the data for the parameters **θ**, the resulting posterior will be only an approximation of the true posterior distribution (Marjoram and Tavare 2006). With this new step M3, Marjoram *et al.* (2003) claimed that the stationary distribution of this SS–MCMC chain is , which should be a good approximation of the true posterior if is small (Marjoram *et al.* 2003).

#### Problems with the SS–MCMC algorithm:

The use of summary statistics and of a threshold value in the modified step M3 introduces some issues that we address below.

Since the Markov chain can begin only if , the number of steps necessary to first satisfy this condition is undefined and it depends on the (unknown) distribution of δ, in the following.

The choice of the threshold value is important as a too large tolerance interval results in a chain being dominated by the prior , as unlikely

**θ**will be often accepted. On the other hand, a too small value leads to a very small acceptance rate. Additionally, the acceptance rate of the chain is proportional to the likelihood of**θ**(Sisson*et al.*2007), making it very sticky in regions of low likelihood, preventing the use of a too small threshold value .Finally, the accuracy of the estimation depends on the choice of the summary statistics and of the distance function ‖ · ‖ (Hamilton

*et al.*2005). In complex models where the likelihood of the model cannot be computed, it is difficult to find sufficient statistics for all parameters, and therefore the definition of an optimal set of summary statistics is an important and still unresolved issue.

#### Modifications to the SS–MCMC algorithm:

##### Calibration step:

We propose here to address the first problem by performing a series of *n* simulations (we typically used *n* = 10,000), where parameters are each time randomly drawn from their prior to obtain , an empirical approximation of . With this calibration step, we can conveniently define a tolerance level ε and a threshold distance such that . For instance, by setting ε = 0.01, we define a value below which the condition will be true for 1% of all randomly simulated data sets. We should then be able to use any simulation for which as a starting point for the chain. These *n* simulations are also used to adjust the proposal range and therefore the transition kernel *q*. The proposal range for new parameter values in the Markov chain is set as a uniform range of width expressed in units of standard deviations, computed independently for each parameter within the *n*ε retained simulations. If the proposal range is too large, the chain may often jump to a state for which , which results in a low acceptance rate and consequently a smaller number of effective samples. If the proposal range is too small, the chain may never explore the whole parameter space and therefore may lead to a large variance in estimated parameters among replicates. Note that standard MCMC techniques aiming at increasing mixing, such as simulated tempering (Bortot *et al.* 2007), can easily be included in the framework presented here.

##### Combining SS–MCMC with ABC:

We propose to address the second issue (small acceptance rate) by launching an SS–MCMC chain of length *s* with a relatively large tolerance (*i.e.*, ε = 0.01). As shown in the appendix, the stationary distribution of this SS–MCMC chain is identical to that obtained by a simple rejection algorithm (as in conventional ABC), using the same tolerance level ε. We therefore propose to estimate parameters on a subsample of size *t* consisting of the simulations associated with the smallest distances δ generated by the Markov chain, as is commonly done in the ABC framework (Pritchard *et al.* 1999; Beaumont *et al.* 2002), and to perform a local regression adjustment where samples are weighted by their associated -values (Beaumont *et al.* 2002), which greatly improves the quality of the estimation. This approach allows the chain to have a large acceptance rate, while making the final estimation not too sensitive to the prior due to the regression adjustment step (Beaumont *et al.* 2002). This two-step approach should therefore lead to the same stationary distribution as a simple rejection algorithm having tolerance . The main gain of SS–MCMC compared to simpler rejection samplers is thus to require many fewer simulations to get *t* samples at tolerance ε′: *s* + *n* simulations under SS–MCMC compared to *s*/ε simulations under ABC, implying a theoretical reduction in computing cost by a factor . For instance, with a calibration step of *n* = 10,000, a chain of *s* = 90,000 steps, and a tolerance of ε = 1%, the expected gain is a 90-fold computational cost reduction. Therefore, we would expect that a total of 100,000 simulations under the improved SS–MCMC framework would correspond to 9 million simulations under the conventional ABC framework, but there are practical limits to this gain that are inherent to the difficulty in running chains for small ε-values (see below). In the following, we use the term ABC–MCMC to designate the procedure where an ABC regression adjustment is performed on a given fraction of the output of an SS–MCMC chain.

##### Partial least-squares transformation of summary statistics:

We propose to address the problem of the choice of informative summary statistics by using a partial least-squares (PLS) regression approach (see, *e.g.*, Tenehaus *et al.* 1995; Boulesteix and Strimmer 2007). Like principal component analysis (PCA), PLS extracts orthogonal components from a high-dimensional data set **X** of predictor variables, but in addition, these components are chosen to appropriately explain the variability of the response variables by maximizing the covariance matrix of predictor and response variables (see, *e.g.*, Boulesteix and Strimmer 2007). In the present ABC context, the predictor variables are raw summary statistics and the response variables are model parameters. The choice of the number of PLS components to include is usually based on a leave-one-out validation procedure (Mevik and Wehrens 2007), examining the root mean square error (RMSE) of the parameters predicted by the regression. As a result of a PLS analysis, one should thus get a much reduced number of independent components, as compared to a large initial set of potentially correlated summary statistics, some of them being little correlated with any parameters and thus only adding noise to the Euclidean distance. Another advantage of the PLS transformation is that it guarantees that the matrix of PLS-transformed summary statistics is nonsingular, which is required when performing the final locally weighted linear regression for estimating parameters (Beaumont *et al.* 2002). In practice, we propose to compute a relatively large set of summary statistics (assumed to be informative about the parameters) on each simulated and observed data set. The *n* random simulations done in the calibration step are used to compute the PLS components, which are then used to transform the summary statistics computed on the simulated data sets generated during the Markov chain. We used the routine “plsr” from the freely available R package “PLS” to compute the PLS components and to select an optimal set of *k* components according to a leave-one-out validation procedure (Mevik and Wehrens 2007). Since the PLS transformation assumes a linear relationship between parameters and statistics, we first applied a multivariate Box–Cox transformation (Box and Cox 1964) on each statistic separately before defining PLS components. Note that other ways to choose appropriate summary statistics could be imagined, like scoring them according to whether their inclusion substantially improves the quality of the inference, as recently proposed (Joyce and Marjoram 2008).

#### ABC–MCMC algorithm:

We describe here the ABC–MCMC algorithm (AM), incorporating the proposed improvements compared to the plain MCMC approach without likelihood:

AM1. Perform

*n*simulations with parameters**θ′**randomly drawn from their priors, and each time compute their associated set of summary statistics**S′**.AM2. Compute PLS components from the

*n***θ′**and**S′**vectors after a Box–Cox transformation of the statistics.AM3. For all

*n*simulations, transform the summary statistics**S′**into*k*retained PLS components, as**S′**_{PLS}. Transform the observed summary statistics**S**as and compute .AM4. Fix ε, estimate from , and set the proposal range of the parameters for the transition kernel on the basis of φ and the variability of the parameters among the retained simulations.

AM5. Start an MCMC chain of total length

*s*from a position**θ**randomly chosen from the simulations closest to*D*. Set*i*= 0.AM6. If now at

**θ**, propose a move to according to a transition kernel . Increment*i*.AM7. Simulate

*D*′ on the basis of . Compute the summary statistics**S′**and transform them into**S′**_{PLS}.AM8. If , stay at

**θ**and go to AM6.AM9. Accept with probability ; otherwise stay at

**θ**.AM10. If

*i*<*s*, go to AM6.AM11. From the

*s*samples of the chain, retain a subsample of size*t*consisting of simulations with smallest associated -values and discard the other*s-t*samples.AM12. Perform an ABC regression adjustment (Beaumont

*et al.*2002) on the*t*retained samples to estimate parameters.

To prevent the chain from remaining stuck at a given starting position after step AM5, we choose a new initial **θ**-value if the chain does not move to a new value after 20 proposals. Note also that we update all parameters at the same time in step AM6. As mentioned above, the width of the proposal range is adjusted independently for each parameter, and it is set to a fraction φ of the standard deviation of the parameters computed from the simulations retained in step AM4. Following Beaumont *et al.* (2002), we used the Euclidean norm as a distance function ‖ · ‖, but without standardization of .

#### Parallelizing ABC–MCMC:

Simulations performed under the conventional ABC approach can be easily distributed among many CPUs. We thus implemented a parallelized version of the SS–MCMC algorithm as follows: after an initial calibration phase of 10,000 random simulations (which can easily be parallelized), we run 10 independent chains of 9000 simulations (including startup) with identical ε*-* and φ-values. The 10 chains are then concatenated and used to estimate posterior densities on the 5000 “best” simulations, using a local regression adjustment (Beaumont *et al.* 2002) to complete the ABC–MCMC algorithm. With this approach, 90% of the 100,000 simulations can thus be distributed on 10 different CPUs, and larger numbers of simulations could be distributed on more CPUs.

#### Illustration and application:

We have tested and applied the ABC–MCMC approach to a model of population divergence with migration (Nielsen and Wakeley 2001). In this model, one assumes that some *T* generations in the past, an ancestral population of size *N*_{A} splits into two populations of size *N*_{1} and *N*_{2} and that migrants are then exchanged between the two populations at rates (looking forward in time) *m*_{12} and *m*_{21}. This methodology has been applied to several different data sets (*e.g.*, Becquet and Przeworski 2007; Hoelzel *et al.* 2007). It must be noted that it is one of the most complex population genetic models for which a full-likelihood implementation is available (Hey and Nielsen 2007; Kuhner 2009). However, we shall not attempt here to compare our methodology with the full-likelihood approach, since the purpose of this article is to improve over conventional ABC and not to compete with full-likelihood methods.

We first compared the conventional ABC and the new ABC–MCMC approaches in a simple case without migration (*m*_{12} = *m*_{21} = 0). All parameters were drawn from uniform priors: number of gene copies per population *N* ∼ *U*[0, 30,000] and divergence time *T* ∼ *U*[0, 16,000] generations. We simulated genetic data with the program SIMCOAL 2.0 (Laval and Excoffier 2004) for 25 diploid individuals per population genotyped at 50 unlinked microsatellites each. Microsatellite data were simulated under a pure stepwise mutation model. While we used a fixed mutation rate μ = 5 × 10^{−4} in this case, the mutation rate was allowed to vary in the applied case, as explained below. A total of 100 test data sets with parameter values drawn randomly from the prior distributions were used to compare ABC and ABC–MCMC. We then compared the conventional ABC and the new ABC–MCMC approaches in a more complex case with migration rates chosen in *U*[0, 0.003]. Again, 100 test data sets were randomly simulated by drawing parameter values from prior distributions.

#### Summary statistics:

Using the software package Arlequin 3.1 (Excoffier *et al.* 2005b), we computed in each population the average and standard deviation (over loci) of the number of alleles (*K*), the range of the allele size (*R*), the expected heterozygosity (*H*), the Garza–Williamson statistic (Garza and Williamson 2001) modified as GW = *K*/(*R* + 1) (Excoffier *et al.* 2005a), and another modification of GW computed as GW* = *K*/(*R*_{Tot} + 1), where *R*_{Tot} is the range in allele size computed over all sampled populations. The idea behind the use of the GW* statistic is that it should reflect population-specific drift effects, since the denominator is the same in all populations. The same statistics and their standard deviation over populations were also computed over the two pooled populations, except GW* since GW = GW* in that case. We additionally computed the differentiation index *F*_{ST} and the genetic distance (δμ)^{2} (Goldstein *et al.* 1995) between the two populations. We thus computed a total of 31 summary statistics over all observed and simulated data sets. PLS components were extracted from summary statistics on the basis of the 10,000 simulations performed in the calibration step. We used the R package PLS (Mevik and Wehrens 2007) to find the appropriate number of PLS components to use (10 for both cases with and without migration).

#### Measuring the accuracy of the methods:

Since simulations are computationally much more demanding than the estimation of the posterior distributions obtained by regressing summary statistics on parameters (Excoffier *et al.* 2005a), we compared ABC to ABC–MCMC samplers in terms of their accuracy. We measured the root mean integrated squared error (RMISE) of the whole posterior distribution, defined as where μ* _{k}* is the true value of the

*k*th parameter and is the estimated posterior density. We also computed the error of three point estimates (mode, mean, and median) as the absolute difference between the point estimate and the true parameter value. To assess the overall quality of the estimation, we measured the geometric mean of these accuracy measures over all parameters, relative to a conventional ABC approach done with 100,000 simulations. More formally, if is the measure of accuracy for the

*i*th parameter (), then the relative accuracy RA of the ABC–MCMC approach is defined as .

The coverage property of the posterior distributions obtained by different methods is also worth checking. By coverage, we mean the proportion of times a true parameter value is present in a given credible interval. For instance, 80 and 95% credible intervals should include the true parameter with probabilities 0.8 and 0.95, respectively. In other words, the posterior quantiles of the true parameter values should be uniformly distributed in [0, 1] (Cook *et al.* 2006). To check the overall coverage property of the posterior distributions obtained with different approaches, we therefore computed the posterior quantiles of the parameters for 100 test data sets for which true parameter values were known. Their uniformity was then assessed with a classical Kolmogorov–Smirnov test.

#### Application of ABC–MCMC to human evolution:

We studied the genetic relationships between three human African populations (the Yoruba and the Mandenka population belonging to the recently diverged Niger–Congo linguistic group and the Mbuti Pygmy population, hereafter simply called Pygmies) analyzed for ∼800 microsatellite markers (Ramachandran *et al.* 2005). We assumed an evolutionary model where the Mandenka and the Yoruba diverged recently from an ancestral Niger–Congo population, itself having diverged earlier from the Pygmy population (see Figure 1). All current and ancestral population pairs were assumed to exchange migrants, but at different rates. We assumed symmetric migration between the two Niger–Congo populations and between the Pygmies and the ancestral Niger–Congo population, but migration rates between Pygmies and the two Niger–Congo populations were allowed to be asymmetrical. To ensure a good fit with the simulated stepwise mutation model, we considered only a subset of 331 tetramicrosatellites selected to have <5% missing data or imperfect repeat alleles coded as missing data.

The model parameters were drawn from the following uniform distributions: number of gene copies per population, *N* ∼ *U*[0, 20,000]; number of migrant genes per generation, *Nm* ∼ 10^{U}^{[0,2]}; divergence time of the two Niger–Congo populations, TDIV_{NG} ∼ *U*[0, 1000] generations; and ancestral divergence time, TDIV_{A} ∼ *U*[0, 4000] generations. We have imposed an additional constraint on the divergence times, such that TDIV_{A} > TDIV_{NG}, resulting in a nonuniform prior on TDIV_{A}. All simulations were performed with the program SIMCOAL 2.0, assuming a pure stepwise mutation model with a per generation average mutation rate drawn in *U*[2 × 10^{−4}, 7 × 10^{−4}]. We implemented locus-specific mutation rates distributed as a Gamma () with the shape parameter α drawn in *U*[8, 16]. The same set of statistics as that defined above was computed and PLS transformed. We retained the first 11 PLS components for the MCMC chain and parameter estimations.

## RESULTS AND DISCUSSION

#### Performance of various ABC approaches:

We have studied the performance of different samplers relative to the ABC approach for a model of population divergence without migration (Table 1). As expected, conventional ABC estimations based on 5 million simulations (ABC–5M) do improve on those done on only 100,000 simulations (ABC–100K), and their inferred posteriors have a very good coverage (Kolmogorov–Smirnov test of uniformity of posterior quantile values, *p* = 0.469). As expected, the performance of the ABC–MCMC approach depends strongly on the choice of the proposal range (φ) and tolerance (ε) values used to run the MCMC chains. Overall, the accuracy of ABC–MCMC increases with lower tolerance ε for all φ-values. It is found to be better than that of ABC–100K, except for very small proposal ranges (φ ≤ 0.01). In that case, the relative errors of the parameters can be very large, and posterior distributions do not pass the test of posterior quantiles uniformity, suggesting that the parameter space is not adequately explored in those cases. Overall, ABC–MCMC performs best for a proposal range of φ = 1 and tolerance ε = 0.01, where accuracy measures are found to be even better than those obtained with ABC–5M. In that case, the acceptance rate of the chain is ∼30% and the coverage of the posteriors is adequate (*p* = 0.23). We note that the acceptance rate of the ABC–MCMC sampler drops sharply with tolerance level ε. For instance, with ε = 0.0001 we were able to run ABC–MCMC chains only for a very few pseudo-observed data sets (six at most with φ = 0.001). Additionally, the acceptance rate is also strongly influenced by the proposal range φ, being two to three times higher with φ = 0.001 than with φ = 5. As also shown in Figure 2 for six random datasets, the ABC–MCMC approach generally leads to posterior distributions close to those obtained with ABC–5M. Posteriors obtained with ABC–100K are in most cases wider, in agreement with the accuracy measures reported in Table 1.

We then compared the different ABC samplers under a model of population divergence with migration. The results presented in supporting information, Table S1 are qualitatively very similar to the case without migration. The values of ε = 0.01 and φ = 1 again lead to the best overall accuracy for the ABC–MCMC approach, which shows correct coverage properties (*p* = 0.29). The accuracy of ABC–MCMC is again comparable to that obtained with the ABC–5M approach, confirming the strong reduction in computation time provided by this approach compared to conventional ABC.

#### Usefulness of PLS components:

In Figure 3, we compare the distribution of the posterior quantiles of the true parameters for different models and different sets of statistics. While the distributions of the posterior quantiles obtained under ABC–5M based on PLS components are uniformly distributed (Kolmogorov–Smirnov test, *p* = 0.469 and *p* = 0.193 for divergence models with and without migration, respectively), this is not the case when raw statistics are used. The posterior quantiles obtained by considering all statistics in ABC–5M are indeed not uniformly distributed and tend to be too large (*p* = 0.002 and *p* < 10^{−12} for divergence models with and without migration, respectively), which implies that the true parameters are globally underestimated when using all statistics. These results show that the use of PLS generally improves the coverage properties of the credible intervals. This was expected since many summary statistics may carry only very little information about the parameters, which makes it very difficult to calculate meaningful distances. Of course, a small set of carefully chosen summary statistics based on their theoretical relation with parameters or on some scoring procedure (*e.g.*, Joyce and Marjoram 2008) may equally well lead to unbiased posterior distributions. However, the present PLS approach seems to provide an objective way to reduce the dimensionality of the summary statistics space while retaining as much of the information about the parameters as possible.

#### Application of ABC–MCMC to African evolution:

We estimated the parameters of a model of divergence and migration between three African populations (Figure 1) on the basis of data from 331 microsatellites using our parallelized ABC–MCMC approach with 1000 independent chains of 10,000 simulations (including startup) with proposal range φ = 1 and tolerance level ε = 0.01. The posterior distributions of the parameters are reported in Figure 4. We find evidence for a very recent divergence between the two Niger–Congo populations (142 generations or ∼3550 years ago, based on a generation time of 25 years), which is in very good agreement with the age of the expansion of farming in western Africa and the diversification of the Niger–Congo language family (Wood *et al.* 2005). The divergence between the Pygmies and the Niger–Congo populations is found to be much more ancient, with *T*_{mode} *=* 2135 generations ago (∼53,400 years), in broad agreement with previous studies (Quintana-Murci *et al.* 2008; Verdu *et al.* 2009), but we note that the 95% highest posterior density credible interval for this time is quite large (1075–3712 generations). While the observed data are compatible with high rates of gene flow between the Yoruba and the Mandenka populations, the pygmies exchange overall many fewer migrants with the two Niger–Congo populations. We find some evidence for higher levels of gene flow from the Pygmies to the Yoruba than in the other direction, an asymmetry already observed in previous analyses of gene flow between Pygmy and neighboring populations (Quintana-Murci *et al.* 2008; Verdu *et al.* 2009). Contrastingly, we find no migration asymmetry between the Pygmies and the Mandenka, which might be expected given the large geographic distance between these two populations. Our results further suggest even lower levels of gene flow between the Pygmies and the ancestral Niger–Congo population, suggesting that the African population was more subdivided at that time. The posterior distributions for the population sizes are all quite wide and point toward relatively large values, even for the Pygmies, with *N*_{mode} = 10,876 gene copies. Interestingly, we found the average mutation rate for tetramicrosatellites () is much lower than that previously estimated (, Zhivotovsky *et al.* 2003), and its variability across loci is relatively low (α ≅ 14).

#### Conclusions:

We have shown here how likelihood-free MCMC approaches can be used to produce approximate posterior distributions. Since has to be chosen relatively large to ensure adequate mixing, the stationary distribution of an SS–MCMC chain may still be a relatively crude estimation of the posterior distribution , and the ABC regression adjustment aiming at obtaining needs to be performed on the output of the SS–MCMC chain. The transition mechanism of the chain needs to be fine tuned for it to mix properly and lead to posteriors with adequate coverage. The combination of a tolerance level ε = 0.01 and a proposal range φ = 1 (standard deviation) seems to provide the overall best results under two models of different complexity. Similar values φ and ε should ensure good mixing in other situations, since the widths of the proposal range φ and the initial tolerance interval ε are expressed in a generic fashion: φ is expressed in units of standard deviations of the parameter retained values, which should therefore scale up for different parameters and be adjusted to the observed data, and ε is just a proportion of simulations arbitrarily close to the observations, which depends neither on the choice of summary statistics nor on the parameterization of the model. Note finally that we did not perform any thinning on the output of the ABC–MCMC chain prior to the regression adjustment, as the rejection step is expected to remove a large fraction of the autocorrelation, which should not affect the estimates if the chain converged.

Note that our reported posterior distributions generally have a very similar modal value but are slightly wider than those obtained under a full-likelihood IMa (Hey and Nielsen 2007) approach (results not shown). While we would not recommend using an ABC approach if a full-likelihood method exists, our results suggest that complex scenarios for which no likelihood-based estimations are available can be relatively well studied with ABC–MCMC, and at a fraction of the computational cost than under conventional ABC. This and similar improvements (*e.g.*, M. A. Beaumont, J.-M. Cornuet, J.-M. Marin and C. P. Robert, unpublished results) should be very useful given the growing use of ABC techniques in demo-genetics studies (see, *e.g.*, Tallmon *et al.* 2004; Excoffier *et al.* 2005a; Hamilton *et al.* 2005; Chan *et al.* 2006; Hickerson *et al.* 2006; Shriner *et al.* 2006; Pascual *et al.* 2007; Legras *et al.* 2007; Rosenblum *et al.* 2007; Cornuet *et al.* 2008; Neuenschwander *et al.* 2008). Indeed, setting up simulation files for an arbitrarily complex model can be done in a few hours using existing simulation programs (*e.g.*, Hudson 2002; Laval and Excoffier 2004; Cornuet *et al.* 2008), allowing one to focus on realistic evolutionary models rather than restricting oneself only to models for which specific programs have been developed. PLS transformation should also allow one to extract as much information as possible from a large set of summary statistics, while keeping the dimensionality of the problem relatively low. While parameter estimation under an ABC framework still requires extensive computing times, it should allow evolutionary geneticists to reasonably estimate the parameters they are really interested in, rather than require them to shift their interest to problems for which full-likelihood solutions are available.

## APPENDIX

Marjoram *et al*. (2003) have formally shown that the posterior distribution of the parameters **θ** of a given model could be obtained from the stationary distribution of a MCMC without likelihood, where data *D*′ simulated from the current parameter values are accepted with probability if *D*′ is equal to the observed data *D* and rejected otherwise. When data are replaced by a set of summary statistics **S**, Marjoram *et al.* (2003) have proposed to accept parameters with probability if summary statistics **S′** simulated from are arbitrarily close to the observed summary statistics (if ). In that case they suggest that the stationary distribution of such a SS–MCMC chain is . We show hereafter that this is the case.

#### Posterior distribution of an SS–MCMC:

Let us first assume that the data are summarized by a single continuous statistic, say *S*. The probability to generate *S* arbitrarily close to the observed statistics is(A1)where *f(s)* is the prior distribution of the statistic. For notational convenience, we note as hereafter. can therefore be defined as(A2)where is the prior of the parameters, and is the conditional density of the statistic.

Following Marjoram *et al*. (2003), if is the transition mechanism of the chain to move from state **θ** to state , and if we assume that , then(A3)showing that the chain is fully reversible and that is the stationary distribution of the chain. Equations A1 and A2 can easily be extended to more than one summary statistics. If data are summarized by multivariate statistics **S** = (*S*_{1}, … , *S _{n}*), then Pr(δ < δ

_{ε}) = Pr(

**S**, δ < δ

_{ε}) becomes the multiple integralwhere is a sphere in the Euclidean

*n*-dimensional space of radius around with respect to the chosen norm .

Note that is also, by definition, the distribution of the retained parameters under a simple ABC algorithm, where randomly generated parameters are accepted if they lead to summary statistics for which and rejected otherwise. It implies that the SS–MCMC approach has the same stationary distribution as an ABC algorithm with similar tolerance level ε. In Figure S1, we empirically show this is the case by reporting the distribution of the distances generated under the conventional ABC rejection algorithm, as well as under the SS–MCMC.

## Acknowledgments

We are grateful to Samuel Neuenschwander, Nicolas Ray, Olivier François, and Gerald Heckel for helpful discussions. We further thank Matthieu Foll, David Balding, Jody Hey, and one anonymous reviewer for their comments on the manuscript. We also thank Nelson Fagundes for suggesting the use of the GW* statistics. This work was supported by a grant from the Swiss National Foundation (no. 3100A0-112072) to L.E.

## Footnotes

Supporting information is available online at http://www.genetics.org/cgi/content/full/genetics.109.102509/DC1.

Communicating editor: R. Nielsen

- Received March 6, 2009.
- Accepted May 31, 2009.

- Copyright © 2009 by the Genetics Society of America