Abstract
A coalescencebased maximumlikelihood method is presented that aims to (i) detect diversityreducing events in the recent history of a population and (ii) distinguish between demographic (e.g., bottlenecks) and selective causes (selective sweep) of a recent reduction of genetic variability. The former goal is achieved by taking account of the distortion in the shape of gene genealogies generated by diversityreducing events: gene trees tend to be more starlike than under the standard coalescent. The latter issue is addressed by comparing patterns between loci: demographic events apply to the whole genome whereas selective events affect distinct regions of the genome to a varying extent. The maximumlikelihood approach allows one to estimate the time and strength of diversityreducing events and to choose among competing hypotheses. An application to sequence data from an African population of Drosophila melanogaster shows that the bottleneck hypothesis is unlikely and that one or several selective sweeps probably occurred in the recent history of this population.
LOW genetic variability in natural populations is not a rare feature: numerous examples have been reported in animals (e.g., O'Brien and Evermann 1988), plants (Liuet al. 1998), and protists (Richet al. 1998), among others. Low presentday levels of variation may reflect a persistent state maintained by, say, nonpanmictic mating systems (Charlesworth and Charlesworth 1995) or recurrent background selection (Charlesworthet al. 1995) at linked loci. In many cases, however, a recent event in the history of the population is invoked to explain reduced variability.
Such diversityreducing events essentially fall into two categories: demographic factors and selective factors. Demographic factors include bottlenecks and population founder events; both involve a temporary reduction of population size resulting in an increased rate of genetic drift. Rapid fixation of a new, favorable allele through directional selection (a “selective sweep”) also generates a sudden drop of genetic variability at linked loci by hitchhiking (MaynardSmith and Haigh 1974). In this article, we address two questions: first, how recent diversityreducing events can be detected, and second, how demographic and selective causes can be distinguished.
The issue of detecting diversityreducing events is not trivial. Usually, a bottleneck (or a selective sweep) is invoked when the variability at some locus of some species is much lower than that usually observed in related species (or at distinct loci). However, mutation and drift processes have a high variance and may generate highly different patterns in distinct species or distinct loci just by chance. Additionally, some variance between observable patterns of polymorphism in distinct species is introduced by random sampling of individuals. Assessing the statistical significance of an “apparent” discrepancy between data sets is therefore essential.
Given that genetic variability has been reduced recently, the question of distinguishing between demographic and selective causes is a major one: the two kinds of events have different biological meanings. Detecting bottlenecks is relevant to conservation biology since the global reduction of genetic variability they induce may endanger populations. Detecting selective sweeps, on the other hand, is an important goal for the study of evolutionary mechanisms. In particular, a longstanding controversy persists about the relative importance of positive selection vs. neutral or nearly neutral evolution at the genomic level (e.g., Gillespie 1991).
The theory of coalescence (Kingman 1982; Hudson 1991) provides a promising framework to address these questions. Variabilityreducing events can be detected because they modify the shape of the genealogy of alleles. Basically, they tend to generate starlike (parts of) genealogies, as a consequence of a sudden increase of coalescence rate (Figure 1). Demographic events apply to the whole genome whereas selective events affect different regions of the genome to various extents thanks to recombination (e.g., Hudsonet al. 1987). This gives the possibility of distinguishing the two hypotheses by sampling several loci: a more or less common pattern is expected in the case of a bottleneck, while selective sweeps generate heterogeneity across loci.
Griffiths and Tavare (1994a) devised an efficient method to compute likelihoods under the coalescent model, which can be easily generalized to the case of variable population size (Griffiths and Tavare 1994b). In this article, we implement a model of sudden reduction of genetic variability into Griffiths and Tavaré's scheme and devise likelihoodratio tests to detect and discriminate between bottlenecks and selective sweeps. The new method is applied to polymorphism sequence data from an African population of Drosophila melanogaster.
METHODS
In this section, we first recall the main ideas of Griffiths and Tavare's (1994a,b, 1995) approach, then show how it can be used to model a bottleneck at one locus, and finally address the issue of hypothesis testing with multiple loci.
Griffiths and Tavaré's method: Consider a data set D_{0} consisting of DNA sequences (genes) sampled at one locus in n individuals of a panmictic population of effective size 2N. Assume that neutral mutations occur at rate μ. Assume that no recombination occurs within the locus. A fundamental recursion is
For large data sets, one cannot compute (3) exactly: there are too many sets of ancestral states. Griffiths and Tavaré's idea was to randomly sample a reasonable number of sets of ancestral states and to estimate the likelihood from this sample. Let A be a set of ancestral states (D_{1}, D_{2}, … , D_{m}), and let H be the hypothesis of the coalescent model (including parameter value θ). Equation 3 can be rewritten
Obviously, an efficient sampling process X is one that samples probable ancestral states (according to H) with high probability, i.e., one as close as possible to the unknown “H given D_{0}” (the distribution of ancestral states under the coalescent conditional on D_{0}). For instance, a uniform sampling of sets of ancestral states is inefficient since most sets of ancestors have a very low probability, but a few of them have a high probability. Sampling uniformly, one would have to perform many replicates of X to get an accurate estimate of the likelihood. Griffiths and Tavare's importance sampling scheme X is a Markov chain: (i) start from D_{0}; (ii) for all possible D_{1}, compute Pr(D_{0}D_{1}, H) according to (2); (iii) randomly sample D_{1} with probability proportional to Pr(D_{0}D_{1}, H),
This algorithm is presumably optimal if “Bayesian” probabilities Pr(D_{i}_{+1}D_{i}, X) equal the unknown Pr(D_{i}_{+1}D_{i}, H): in this case, X becomes identical to H given D_{0}. This requirement is met if, for any given state D_{i}, all the possible ancestral states D_{i+1} are equally probable under H.
The above equations hold whatever the assumed mutation model. In this article, we used the infinitesite model: it is assumed that no more than one mutation arises at any one site in the genealogy. A consequence is that no more than two distinct states should be observed at any site. Under this model, m equals s + n − 1 (where s is the number of segregating sites in the data set), and coefficients in (2) are k_{c} = n_{c}/(n − 1) and k_{m} = n_{m}/n, where n_{c} is the number of genes in D_{1} leading to D_{0} by splitting (in the case of a coalescent event), and n_{m} is the number of genes in D_{1} leading to D_{0} by mutating (in the case of a mutation event).
A bottleneck model: The above scheme can be used with models more complex than the standard coalescent (e.g., Nielsen 1998). Griffiths and Tavare (1994b) showed how it can be generalized to account for variable population size. In this case, the relative probabilities of coalescence and mutation given in (2) depend on the time of the current state. This means that one has to keep track of the times of successive events when sampling ancestral states backward through the genealogy.
The bottleneck model we used has three parameters: population mutation rate θ, time T of occurrence of the bottleneck, and “strength” S of the bottleneck; all are scaled relative to a timescale set by 2N, which is the current number of genes. Looking backward in time, it is assumed that the population undergoes a drop of effective size at T (measured in units of 2N generations) during a short period of time and then recovers its initial size. If the duration of the bottleneck is short enough that one can neglect the occurrence of mutations during that period, the effect of the bottleneck depends only on the amount of coalescence it generates. Parameter S measures this coalescence pressure: it is the time that would be required for an equal expected amount of coalescence if the population size had not changed. Under these assumptions, the bottleneck model can be implemented under Griffiths and Tavare's scheme by keeping N constant, but changing the time scale during the bottleneck. The new Markov chain X′ has three distinct phases: (i) starting from t = 0, recurse until t reaches T allowing coalescences and mutations, as in X; (ii) while T < t < T + S, recurse allowing coalescences only; and (iii) when t > T + S, switch on mutations again. In case of a severe bottleneck (high S), phase (iii) may not be reached for most realizations of X′: only one lineage survives the bottleneck (backward), as in genealogy B_{2} of Figure 1. This model reduces to the constantsize model by either setting S = 0 or T = ∞.
Maximizing the likelihood: The problem here is to find the values of θ, T, and S that maximize the likelihood for a given data set. Griffiths and Tavare (1994a) provide an efficient method for generating likelihood surfaces with respect to θ. The basic idea is to calculate the likelihoods of many θ's using a single sample of sets of ancestral states. This sample is obtained by performing X with transition probabilities computed from one particular value of θ called θ_{0}. Theoretically, this procedure may be used for all the parameters of any model. In practice, however, it does not work properly for T and S in the bottleneck model. The reason is that sets of ancestral states that have a high probability for some value T_{0} (S_{0}) of the bottleneck time (strength) may have very low probability for other T's (S's). Using a common sample for all T's (S's) would therefore lead to variable accuracy in the estimation of the likelihood across parameter values. Therefore, a single sample of sets of ancestral states was used to generate a likelihood curve with respect to θ given T and S, but different samples were used for different (T, S) pairs.
We used a numerical technique to maximize the likelihood with respect to T and S. The problem with standard algorithms in this particular case is that the function to be maximized is “unstable”: because of the stochastic process, several evaluations of the likelihood for a given (T, S) would return distinct numbers. The heuristic we used is a modification of the downhill simplex (Presset al. 1992). Details can be found in the help file of the program, both available on request.
The multilocus case and likelihoodratio tests: We now turn to the problem of distinguishing selective sweeps from bottlenecks. We approximate the expected effect of a selective sweep at one neutral locus linked to the locus under selection by that of a population bottleneck: T is the time of fixation of the favorable mutation, and S depends on the ratio between the selection coefficient associated with this mutation and the recombination rate between the selected locus and the neutral locus. The discrepancy between the two hypotheses appears when several loci are considered: under the bottleneck model, all loci share a common T and S, while distinct loci have distinct T's and S's under the selective sweep hypothesis. In both cases, a specific mutation rate θ is assigned to each locus. Three nested models are therefore to be compared. Suppose that p loci are examined:
M_{1} (no founder event), p parameters: θ_{1}, θ_{2}, … , θ_{p}
M_{2} (bottleneck), p + 2 parameters: θ_{1}, θ_{2}, … , θ_{p}, S, T
M_{3} (selective sweep), 3p parameters: θ_{1}, S_{1}, T_{1}, θ_{2}, S_{2}, T_{2}, … , θ_{p}, S_{p}, T_{p}.
The likelihood for a data set of several independent loci is the product of the likelihoods for individual loci. Likelihoodratio tests can be performed to detect a diversityreducing event (M_{2} vs. M_{1} and M_{3} vs. M_{1}) and to distinguish sweeps from bottlenecks (M_{3} vs. M_{2}): twice the logarithm of the ratio of likelihoods of two competing models asymptotically follows a χ^{2} distribution with k d.f., where k is the difference between the numbers of parameters of the two models.
SIMULATIONS
The reliability and efficiency of our method for detecting bottlenecks were assessed using simulated data sets, although an exhaustive power analysis was impossible because of the extensive running time. We first simulated 100 onelocus data sets under the null hypothesis of constant population size (eight genes, θ = 10). The null hypothesis was rejected in 6 cases out of 100, suggesting that the test is reliable. Bottlenecked data sets were simulated under two conditions: old, strong vs. recent, weak bottlenecks (see Table 1). The number of rejections of the null hypothesis (power) and the mean and standard errors of estimates of parameters T and S are shown (Table 1). The power of the test was ~25%, and the parameter estimators were quite imprecise. The power, however, was significantly higher than that of Tajima's (1989) Dstatistics, sometimes used to detect bottlenecks. When fourlocus data sets (simulated under the same conditions) were analyzed, the power of the test and the accuracies of parameter estimates significantly increased (Table 2), suggesting that standard multilocus DNA sequence data sets are large enough to allow a reliable reconstruction of population history.
The above analyses address the M_{2}/M_{1} test, i.e., detecting bottlenecks. The power to detect selective sweeps is more difficult to assess since it depends much on how many loci depart from the null hypothesis. For example, when twolocus data sets including one “neutral” locus plus one bottlenecked locus were analyzed, the rejection rate (M_{3}/M_{1}) was 24 out of 100. This power would of course be increased by adding loci with their own T and S, but decreased by adding “neutral” loci. The M_{3}/ M_{1} test is conservative: the maximum rejection rate under the null hypothesis is 5%.
DATA ANALYSIS
The above method has been applied to DNA sequence data obtained from an African population of D. melanogaster (Lamto, Ivory Coast). Three loci were used: Fat Body Protein 2 (Fbp2, 2.15 kb, 10 individuals; Benassiet al. 1999), Suppressor of Hairless [Su(H), 1 kb, 20 individuals; Depauliset al. 1999], and Vacuolar H^{+} ATPase 681 (Vha, 1 kb, 20 individuals; Depaulis 1998). These genes are located near a region polymorphic for a chromosomal inversion on chromosome 2. This proximity increases the chances of detecting a selective sweep, if any (see Depauliset al. 1999). Loci were sequenced in distinct but overlapping samples of a single population of D. melanogaster.
For each data set, sequences were truncated to fit the assumptions of infinite number of sites and no recombination: the largest segment of each locus showing no homoplasy was sought. Sites in such segments are phylogenetically compatible: one can find a genealogy for which the mutants at any site are a monophyletic group. Sites showing more than two distinct states were removed. This dataparing strategy reduced the number of variable sites from 64 to 19 (Fbp2), 44 to 40 [Su(H)], and 11 to 11 (Vha), respectively. Sites were oriented: the ancestral/derived status of character states was determined. Orienting sites allows one to sample rooted rather than unrooted genealogies during the likelihood estimation (Griffiths and Tavare 1995), greatly decreasing the running time. To orient sites, we first reconstructed a neighborjoining phylogenetic tree (Saitou and Nei 1987; observed divergence) and located the root thanks to an outgroup sequence (D. simulans). Sites were oriented according to this tree: the monophyletic character state was said to be derived. When both character states defined a monophyletic group (i.e., when the mutation occurred in the branch connected to the root), the state shared by the outgroup was supposed to be ancestral.
The maximum likelihood of the data under three competing models is given in Table 3, together with the parameter estimates. Likelihoodratio tests favored the hypothesis of a selective sweep (M_{3}) vs. either the nofounderevent model (M_{1}) or the bottleneck model (M_{2}). A demographic event seems unlikely to explain the observed pattern, as indicated by the M_{2} vs. M_{1} comparison. The optimal times of occurrence and strengths of variabilityreducing events were quite different among loci (model M_{3}): a very recent, weak sweep was detected for locus Su(H), a strong, recent one for locus Vha, while no significant sweep was found at locus Fbp2. As a consequence, the optimal T value under model M_{2} is high: no recent bottleneck scenario was found that fits the data better than the simple nofounderevent model. By excluding demographic hypotheses, this result reinforces the hypothesis that one or several selective sweeps may have occurred recently in this region of chromosome 2 for this African population of D. melanogaster (Depauliset al. 1999).
DISCUSSION
The new method presented in this article aims to reconstruct the recent history of a population. It allows detection of diversityreducing events at one or several loci and bottlenecks to be distinguished from selective sweeps if more than one locus is available. The information dealt with to achieve the former goal is the distortion in gene genealogies generated by diversityreducing events. The latter issue—distinguishing demographic from selective causes—is addressed by measuring the heterogeneity in time and strength of diversityreducing events across loci. The maximumlikelihood approach allows one to test hypotheses and to estimate the times and strengths of diversityreducing events. It is more efficient than methods based on pairwise differences (e.g., Rogers and Harpending 1992) or test statistics (Tajima 1989), which do not make use of all the information contained in the data. Note that this method applies only to panmictic populations. Population structure is likely to introduce bias, especially if samples for distinct loci belong to distinct demes.
Theoretically, it should be possible to distinguish bottlenecks from selective sweeps using a single locus. This is because what is happening during the course of the event is different in both situations. Basically, a bottleneck applies identically to all the lineages that enter it. In the case of a partial selective sweep (where recombination occurred, so that more than one lineage escapes the sweep, e.g., tree B_{1}, Figure 1), the lineage originally associated with the favorable mutation has a particular status: it is older than the other lineages that emerged thanks to recombination at various times during the sweep. Barton (1998) gives a detailed description of the properties of genealogies under a selective sweep. He shows that the discrepancy mentioned above results in distinct expected distributions of the size of “gene families” (see Figure 1 legend) under the two hypotheses. These can readily be distinguished statistically from a sample of 100 genes, with pairwise identity 0.1, provided that the genealogy is known with certainty (compare Figures 8 and 9 of Barton 1998). However, it is not known how far errors in estimating the genealogy from (say) infinitesites mutation reduce the power of this method. We decided here to neglect this difference and to approximate the effect of a sweep at one locus by that of a bottleneck. We suspect that for many data sets the major part of the information lies in the heterogeneity between loci, rather than in the pattern at individual loci. This intuition would be worth verifying formally.
The method we present does not make use of data from an outgroup, in contrast with, say, the Hudson, Kreitman and Aguadé (HKA) test (Hudsonet al. 1987). If it has some power in its current form—and our data analysis suggests it actually has some—then this property should be considered a strength. Using outgroup sequence data to estimate some neutral mutation rate involves making disputable assumptions. Any selective force having applied to some of the surveyed loci since the ingroup and the outgroup diverged may bias the estimation of mutation rates. Departure from the molecular clock has been observed in many genes and many taxonomic groups (e.g., Li 1993). It may lead to significant HKA tests even if all the loci under consideration are currently neutrally evolving. If the user believes he has a reliable outgroup, information about it can be incorporated into our method. First, characters can be oriented by deciding that the state observed in the outgroup is the ancestral one. Second, the relative mutation rate of loci can be estimated from the ingroup/outgroup divergence. This would add valuable information and reduce the number of parameters of each model by p − 1, where p is the number of loci. Incidentally, this would significantly reduce the running time.
Two assumptions of the present method deserve discussion: the infinitesite mutation model and the no recombination assumption. Both are clearly violated by some data sets. One has to worry about them before using the method—Griffiths and Tavare's algorithm can be applied only if the data are consistent with these assumptions, i.e., if distinct sites support phylogenetically compatible bipartitions of the individuals.
The assumption of an infinite number of sites can be avoided. Kuhner et al. (1995) compute the likelihood using a finitesite mutation model in the constantpopulation size case and use the MetropolisHastings algorithm (Hastings 1970) to find the value of θ that maximizes it. This, however, involves exploring a larger space of sets of ancestral states, e.g., including genealogies where identical genes are not monophyletic. The MetropolisHastings Monte Carlo Markov chain algorithm is an interesting alternative to Griffiths and Tavare's method for computing likelihoods under the coalescent (e.g., see Wilson and Balding 1998). In the case of the bottleneck model, it may provide an efficient way to maximize the likelihood with respect to T and S.
When its assumptions are more or less met, the infinitesite model (and DNA sequence data) is presumably preferable to the infiniteallele model (and, say, microsatellite data) for the purpose of detecting diversityreducing events. The latter model is one where each mutation creates a new allele, but where successive mutations in the same lineage “hide” each other. The infinitesite model is better because, in addition to allele frequencies, the number of differences between alleles carries much information. Suppose that a moderate bottleneck occurred very recently in the history of a population, so that no mutations have arisen since T. The expected pattern of allele frequencies is identical to that expected under constant population size and low θ, since the shape of the observable genealogy is a standard one (Figure 2). Sequence data, however, would reveal a large number of segregating sites (i.e., highly divergent alleles), incompatible with the hypothesis of low θ, and therefore would have some power to detect the bottleneck. Microsatellite data, however, are often more variable than sequence data and are more easily collected from a high number of loci. Cornuet and Luikart (1996) devised tests for detecting bottlenecks from allelefrequency data, using the sampling distribution of relevant statistics. Their power analysis indicates that very recent bottlenecks cannot be detected from allele frequencies, consistent with the above argument and with Maruyama and Fuerst (1985).
The assumption of no recombination within loci during the genealogy is a major one. Meeting it involves reducing sequences to blocks whose sites share a unique genealogy and therefore losing information. Furthermore, one can hardly be sure that the length of sequence used is actually nonrecombined, even when no incompatibility between sites is found. Whether undetected recombination events do or do not bias the method is an open question that goes beyond the scope of this article. We doubt, however, that this issue has major practical consequences. This is because the bias, if any, must be higher when data strongly depart from the model assumptions, i.e., when distinct fragments of the surveyed sequence have highly different actual genealogies. But important departures are likely to be detected by the fourgamete rule (see data analysis). Undetected recombination events are more likely to occur when distinct fragments have closely related genealogies, i.e., when the bias is low.
For data sets showing a high number of recombination events, the present method is inapplicable. Actually, such data sets hardly include any genealogical information. Rather, the data can be recoded by pooling together sites of equal “size” (i.e., the number of individuals carrying the mutation), irrespective of which individuals carry the mutation. Coalescence theory allows predictions about the frequency distribution of these classes of sites under various models of population history (Wakeley and Hey 1997). This approach may be applied to the three models we develop in this article, making it possible to detect diversityreducing events from highly recombining sequence data.
Acknowledgments
Many thanks to Jody Hey and Kevin Dawson for helpful discussions. This work was supported by grant NERC R34365 and by a postdoctoral fellowship from the Federation of European Biochemical Societies.
Footnotes

Communicating editor: A. G. Clark
 Received July 24, 1999.
 Accepted February 29, 2000.
 Copyright © 2000 by the Genetics Society of America