- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Wall, J. D.
- Articles by Przeworski, M.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Wall, J. D.
- Articles by Przeworski, M.
Testing Models of Selection and Demography in Drosophila simulans
Jeffrey D. Walla, Peter Andolfattob, and Molly Przeworski2,ca Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138,
b Institute of Cell, Animal and Population Biology, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom
c Department of Statistics, University of Oxford, Oxford OX1 3TG, United Kingdom
Corresponding author: Jeffrey D. Wall, Harvard University, 16 Divinity Ave., Cambridge, MA 02138., jwall{at}fas.harvard.edu (E-mail)
Communicating editor: H. OCHMAN
| ABSTRACT |
|---|
We analyze patterns of nucleotide variability at 15 X-linked loci and 14 autosomal loci from a North American population of Drosophila simulans. We show that there is significantly more linkage disequilibrium on the X chromosome than on chromosome arm 3R and much more linkage disequilibrium on both chromosomes than expected from estimates of recombination rates, mutation rates, and levels of diversity. To explore what types of evolutionary models might explain this observation, we examine a model of recurrent, nonoverlapping selective sweeps and a model of a recent drastic bottleneck (e.g., founder event) in the demographic history of North American populations of D. simulans. The simple sweep model is not consistent with the observed patterns of linkage disequilibrium nor with the observed frequencies of segregating mutations. Under a restricted range of parameter values, a simple bottleneck model is consistent with multiple facets of the data. While our results do not exclude some influence of selection on X vs. autosome variability levels, they suggest that demography alone may account for patterns of linkage disequilibrium and the frequency spectrum of segregating mutations in this population of D. simulans.
A fundamental question in population genetics is the relative importance of natural selection vs. neutral and/or demographic factors in shaping genome-wide patterns of sequence variability (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
In one such study, ![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Considering what we know about the demographic history of D. simulans populations, it may also be relevant to consider nonselective explanations for Begun and Whitley's observation. D. simulans is a human commensal; it is thought to have originated in tropical Africa and may have colonized the Americas as recently as a few hundred years ago (![]()
![]()
![]()
![]()
![]()
![]()
![]()
In this article, we revisit Begun and Whitley's data, by explicitly considering both a model of recurrent selective sweeps and a recent bottleneck model; we also examine additional aspects of the data besides levels of diversity. ![]()
![]()
(= 4Nr, where N is the effective population size and r is the sex-averaged recombination rate per base pair per generation) from the sequence data at each locus (CHRM, cf. ![]()

=
/4
. Low estimated values of 
indicate high levels of linkage disequilibrium and vice versa. A similar approach was used by ![]()
was much smaller than the standard estimate of N based on levels of variability and an estimate of the mutation rate, which was interpreted as a genome-wide excess of linkage disequilibrium in the two species. Our study presents the advantages of consistently sampled data and more accurate estimates of
.
| METHODS |
|---|
A list of definitions for the symbols used in this paper (in approximate order of introduction) can be found in Table 1.
|
Loci and samples:
We consider 29 loci collected from a single D. simulans population (Wolfskill Orchard, California), reported by ![]()
. The alternative, i.e., considering multiple substitutions as multiple mutations with missing information, would lead to underestimates of H and RM due to the missing information and thus to underestimates of
(see below).
Estimating r:
Previous methods for estimating r have fit high-order polynomial curves to the available genetic and physical map data for D. simulans (![]()
![]()
![]()
![]()
![]()
X chromosome:
Genetic map positions for 14 marker loci are taken from ![]()
![]()
![]()
![]()
![]()
![]()
|
Chromosome 3R:
Genetic map positions for 13 marker loci were obtained from ![]()
![]()
If we assume that genetic distance increases with the square of physical distance from the centromere (0.0 Mb) to delta (5.4 Mb), we estimate a rate of 2.2 cM/Mb for the miranda locus (4.9 Mb). We localized the pitchoune locus (93F16-94A1) outside of the distal breakpoint (93F6-7) of the fixed 3R inversion difference between D. melanogaster and D. simulans by in situ hybridization on polytene chromosomes (using a method modified from ![]()
15.5 Mb from the centromere (i.e., region II). The remaining 12 loci on chromosome 3R considered for polymorphism analyses are also in region II.
Estimating N
:
We proceed assuming that the true r is known for each locus. Two summaries of linkage disequilibrium are H, the observed number of distinct haplotypes, and RM, the minimum number of inferred recombination events (![]()
), or equivalently

assuming the null model. Likelihoods are calculated for each locus separately, as well as for all X-linked loci and all autosomal loci (see below). Here "lik" refers to the likelihood, and the joint likelihood at a collection of loci is found by multiplying the likelihoods at the individual loci. This is reasonable since none of the loci are closely linked to each other and thus can be considered as evolutionarily independent. Define Nx and Na as the effective population sizes of the X and the autosomes, respectively. Define 
x and 
a as the maximum-likelihood estimates for the X-linked and autosomal loci; 
x is the value of Nx that maximizes
all X-linked locilik(Nx|H, RM), while 
a is the value of Na that maximizes
all autosomal locilik(Na|H, RM). We choose to summarize the data before performing maximum likelihood because of computational constraints; maximum likelihood on the full data is computationally infeasible even for recombination rates one-tenth of those considered here. As it is, the likelihoods for this article took several months of computing time on a pair of 600 Mhz Pentium III processors.
The likelihoods were estimated from simulations that assume a neutral infinite-sites model and use the protocol of ![]()
/2. (
= 4Nµ, where N is the effective population size and µ is the mutation rate per base pair per generation.) Instead, we generate random genealogies and then place the observed number of mutations, S, on the tree. One motivation for this procedure is that S is observed, while
must be estimated (![]()
![]()
![]()
Null simulations:
We compare the selective sweep and bottleneck models (described below) with a constant-sized, panmictic, neutral coalescent model (![]()
, the population mutation rate, we split the data into three classes of sites (introns, synonymous sites, and nonsynonymous sites) and assume a fixed neutral (population) mutation rate for each class.
is then estimated for each class using ![]()
Selective sweep simulations:
We consider a model where recurrent, nonoverlapping favorable alleles arise at sites linked to a neutral locus. The model assumes that beneficial mutations are selected immediately upon introduction into the population. Our methods follow those of ![]()
![]()
![]()
![]()
![]()
to 1 -
. We implement our simulations with
= 1/2N (as in ![]()
Denote the decrease in diversity due to hitchhiking by (1 - 
). We choose three plausible values for 
: 0.85, 0.75, and 0.65 for the autosomes and 0.65, 0.55, and 0.45 for the X. The autosomal values were chosen to be close to the observed ratio of non-African to African diversity levels (![]()

. Note also that data from other loci suggest that the ratio of X to autosomal levels of diversity may be higher than was observed by ![]()
![]()
is estimated for each locus as in the null simulations and then divided by 
. The rate of selective sweeps (
r in ![]()

) in
. The actual values used are listed in Table 2. In almost all of the 3R simulations and for larger values of Nx,
r is small enough that the probability of a second benefical mutation arising while a sweep is still ongoing is <0.05. See the DISCUSSION for more on the applicability of the model.
|
Bottleneck simulations:
It is relatively straightforward to incorporate changes in population size into the coalescent framework (e.g., ![]()
![]()

) (see above). As before, we assume a fixed mutation rate for introns, synonymous sites, and nonsynonymous sites, and we estimate these rates from the data (cf. ![]()

. Since one of our goals is to examine whether bottlenecks have a stronger effect on the X, we use the estimated autosomal mutation rates for the X chromosome as well, after multiplying by nr to correct for the chromosomal differences in effective population size. The scaled time T0 (in coalescent time units) is similarly divided by nr for the X relative to 3R, but the population sizes estimated from the patterns of linkage disequilibrium (described below) are assumed to freely vary. This need arises because
values for X-linked loci must be close to WATTERSON's (1975) estimate of
for simulations to be comparable (i.e., levels of diversity in the simulations should be close to what is observed in the data). However, it is better to allow Nx and Na (as estimates of linkage disequilibrium) to vary freely, so that we can see what effect bottlenecks have on linkage disequilibrium. We present results for the following parameter combinations: (a) T0 = 2000 generations ago, 
= 0.85, and nr = 0.6; (b) T0 = 2000 generations ago, 
= 0.75, and nr = 0.7; (c) T0 = 1.2 x 105 generations ago, 
= 0.85, and nr = 0.7. T0 values were chosen to correspond to recent (
200 years ago) or ancient (
12,000 years ago) colonization of the Americas, 
was chosen to be close to the ratio of non-African to African autosomal diversity levels in D. simulans (0.76, cf. Table 3 in ![]()
![]()
|
Frequency spectrum of segregating mutations:
We use D (![]()
, the average D value for the X-linked loci and the 3R loci, separately and tabulate both the average simulated
value and the proportion of simulations that have
greater than or equal to what is observed (see RESULTS). A total of 5000 replicates were run for each model and parameter combination. We present results for only the most conservative values of Nx and Na.
Estimating µ:
We take a value of 1.5 x 10-9 per site per generation for the neutral mutation rate at silent sites. This estimate is based on average per year divergence at synonymous sites in various Drosophila species comparisons (![]()
![]()
![]()
![]()
![]()
Credibility intervals for N
x/N
a:
To assess what range of X to autosomal diversity levels is consistent with the data of ![]()
x/N
a from the observed numbers of segregating sites. We take a neutral mutation rate of
= 1.5 x 10-9 per site per generation (see above). As with the bottleneck simulations, we assume fixed population mutation rates for synonymous sites, nonsynonymous sites, and introns and estimate these (cf. ![]()
x/N
a) apply to the X-linked loci as well and calculate

Here S refers to the total number of inferred segregating sites summed over all X-linked loci. We employ the standard
2 approximation for -2 ln(L1/L0) to obtain approximate 95% credibility intervals, where L0 is the maximum likelihood and L1 is the likelihood at an alternative parameter value.
Likelihood-based statistics:
To quantify how consistent the actual data are with the null model, we employ a likelihood-ratio test. We calculate 
x and 
a from the actual data as described earlier. Then, for Nx = N0 and Na = N1, we calculate

The significance levels for R are determined by simulation for a range of N0 and N1 values. We simulate 104 replicates of the 29 loci with Nx = N0 and Na = N1. Then, we calculate 
x0 and 
a1 for each replicate, where 
x0 is the value of Nx that maximizes

and N
a1 is the value of Na that maximizes

The collection of

values provides a simulated distribution of R(N0, N1) values, from which we tabulate how often the simulated R values are greater than or equal to the actual R value. For each parameter combination, we also calculate what proportion of trials have estimated population sizes 
x0 and 
a1 equal to the values estimated from the actual data. Define

R*(N0, N1) is the likelihood of the actual effective population size estimates (using H and RM) when data are generated under the null model (with Nx = N0 and Na = N1). From our simulations, we plot the value of R* as a function of N0 and N1.
Ideally, we would like to perform the same analyses under the selective sweep and bottleneck models, but calculating the relevant likelihoods is computationally prohibitive. Instead, we use R* again, with all likelihoods calculated assuming the null model, even though the simulated data are generated under a different model. As before, R* is a measure of how likely it is for the simulated data to produce the actual estimated population sizes. A total of 104 replicates are run for each parameter combination. Other ad hoc statistics were considered; they all produced similar results (results not shown).
| RESULTS |
|---|
Excess linkage disequilibrium on all chromosomes:
Table 4 shows the estimates of 
on the basis of H, RM, and
for each locus. One observation that is immediately apparent is that these values are systematically less than estimates of N based on estimates of the neutral mutation rate and observed levels of polymorphism. For example, if we take
= 1.5 x 10-9/bp/generation for silent sites (see METHODS) and
= 0.030 per synonymous base pair (estimated from all of the autosomal loci considered in this article, cf. ![]()

a = 5.0 x 106. The corresponding estimate from the X-linked loci is 
x = 2.5 x 106. In contrast, 26 out of 29 loci have 
estimates at least an order of magnitude less than the corresponding 
estimate. This discrepancy between 
(estimated from linkage disequilibrium) and 
(estimated from levels of diversity) has been noted before with different Drosophila data and slightly different methodology (![]()
. Note that the low values in Table 4 are not the result of the particular properties of
= CHRM. In fact, simulations (under the standard equilibrium neutral model) show that for the small sample sizes considered here, CHRM is biased upward, suggesting that the 
values in Table 4 might on average be overestimates (J. D. WALL, unpublished results).
|
Contrasting patterns between X and autosomes:
Because patterns of variation vary greatly from locus to locus even when the underlying parameters are the same, the precision of the estimate of N can be greatly increased by combining information from multiple loci. Fig 2 shows the relative log likelihoods of N for all of the X-linked loci (the curve on the left) and all of the autosomal loci (the curve on the right). For ease of comparison, the curves have been normalized so that their maxima are at 0. It is striking how distinct the two likelihood curves are: 
a (= 3.2 x 105) is more than six times 
x (= 0.5 x 105). The horizontal line in Fig 2 shows the
95% credibility intervals (using the standard asymptotic approximations for maximum likelihood) for the chromosome-specific estimates of N; the two intervals do not overlap. A nonparametric rank order test shows that the locus-specific 
estimates for the X-linked loci are indeed less than the autosomal estimates (Table 4, Mann-Whitney U-test; P < 0.002).
|
Note that since males carry only one X chromosome, we do not necessarily expect Nx to equal Na. If male and female effective population sizes are equal, then 4Nx = 3Na. However, there are many possible factors that may cause the two effective population sizes to be unequal (![]()
![]()
![]()
W values (cf. ![]()

x/
a
0.50. When all sites are considered (with different rates for synonymous sites, nonsynonymous sites, and introns), then 
x/
a
0.59. The
95% credibility interval for N
x/N
a based on all sites (see METHODS) is 0.430.69. Under neutrality, N
x/N
a = 0.50 is unexpected, regardless of how biased the gender-specific population sizes are (![]()
![]()

x/
a = 0.16) is substantially greater than the difference in their diversity levels.
Fig 3A shows the P value of R (see METHODS) as a function of Nx and Na. For all population sizes where 2Nx
Na, the actual value of R is significantly too large. This suggests that there is significantly more linkage disequilibrium on the X than on 3R, even after correcting for the differences in effective population sizes suggested by diversity levels. For the same population sizes, Fig 3B shows the proportion of trials for which 
x0 = 0.5 x 105 and 
a1 = 3.2 x 105 (see METHODS). The different shading categories were chosen so that in Fig 3, a and b were as similar in appearance as possible; the lightest areas on both graphs represent areas of parameter space that are compatible with the data. For all population sizes where 2Nx
Na, the value of R* is quite small (i.e., R* < 8.0 x 10-4). If instead we repeat the rank order test with the null hypothesis that 2Nx = Na, then the two chromosomes are still significantly different (Table 4, Mann-Whitney U-test; P < 0.01).
|
The chromosomal difference in diversity levels (![]()
![]()
Sweep model:
We considered all combinations of 
= 0.85, 0.75, and 0.65 for 3R and 
= 0.65, 0.55, and 0.45 for the X. All nine sets of simulations produced very similar results, and we display only a representative pair of them here. Fig 4 shows the value of R* as a function of Nx and Na. Fig 4A has 
= 0.85 for 3R and 
= 0.45 for the X, while the corresponding 
values in Fig 4B are 0.75 and 0.55, respectively. We find that recurrent selective sweeps do not lead to striking increases in levels of linkage disequilibrium, as measured (see also ![]()

x0 and 
a1 in the sweep simulations is no more than what is expected from the decrease in levels of diversity. In other words, the estimated ratio of the number of recombination events to the number of mutation events,
/
W, does not vary much when data are generated under either the null model or the recurrent selective sweep model. Exploratory simulations suggest that this observation might hold under a wider range of sample sizes and relative recombination rates than considered for the D. simulans data (results not shown). Thus, this simple model for repeated episodes of positive selection seems to explain neither the overall high levels of linkage disequilibrium nor the chromosomal difference in levels of linkage disequilibrium.
|
Previous work has shown that recurrent selective sweeps lead to a strong skew in the frequency spectrum toward an excess of rare variants (![]()
![]()
) greater than or equal to what is actually observed. Under recurrent hitchhiking, the average simulated
is negative, as expected. The actual
for the X-linked loci is significantly too high (P < 0.004, one-tailed test), while the true
for the autosomal loci is not unusual.
|
Bottleneck model:
Due to computational constraints, we consider only a few parameter combinations. Fig 5 shows R* as a function of Nx and Na for three different examples. As can be seen, recent bottlenecks are consistent with much higher effective population sizes. Equivalently, recent bottlenecks cause an increase in observed levels of linkage disequilibrium. For example, two out of three (Fig 5, a and b) are consistent with Na = 5.0 x 106, while one out of three (Fig 5B) is consistent with Nx = 2.5 x 106. In addition, at least for some parameter values (e.g., those of Fig 5B), the average ratio of 
x0/
a1 is much larger than what was estimated under the null model (0.16) and closer to the expectation from levels of diversity (i.e., 0.59). Since there are fewer X chromosomes than autosomes, a bottleneck is more severe for the X (i.e., the minimal population size is smaller). This leads to both a greater increase in linkage disequilibrium and a greater reduction in levels of variability on the X relative to the autosomes. In principle, a recent bottleneck might explain both the chromosomal differences in levels of linkage disequilibrium and the overall high levels of linkage disequilibrium, but it remains to be seen whether the parameter values required are plausible (see DISCUSSION).
|
The effect of a bottleneck on the frequency spectrum is complex. For results from a similar model, see ![]()
values, as well as the proportion of simulated
values greater than or equal to the actual values. As is expected under a recent bottleneck, the actual
is higher for the X-linked loci than it is for the autosomal loci. In all cases, the actual
values for both the X and 3R are within the middle 95% of the simulated distribution, though
for the 3R loci is close to being significantly too low.
| DISCUSSION |
|---|
This study analyzes sequence data from a North American population of D. simulans and documents that the high observed levels of linkage disequilibrium and the chromosomal differences in levels of linkage disequilibrium are not expected under the standard null model. Both demographic and selective departures from the null model are possible explanations, and we considered two of these alternatives to the null model. We describe below some of the difficulties associated with assessing whether these models are appropriate.
The bottleneck model:
Not much is known about the demographic history of North American populations of D. simulans, but as a human commensal, D. simulans is unlikely to have arrived in North America before humans did. The first people in the Americas are thought to have crossed via the Bering Strait
14,00015,000 years ago (see, e.g., ![]()
500 years ago. No one knows when D. simulans first started crossing the Atlantic as stowaways on ships, but it seems plausible that at first the number of migrants was limited. Both the volume of traffic and the cargo composition changed slowly over time; at some point in the past, successful migration to the Americas must have been possible but difficult. So, independent of genetic data, a recent bottleneck in the history of American populations of D. simulans seems to be a reasonable demographic model. We chose to model a single founder event, followed by rapid population growth. Perhaps a more realistic model would have many founder events, spread out over time (continuing to the present day). However, the earliest migrants might have contributed a disproportionally large amount to the gene pool of the new population; the newly founded population may have had ample opportunity to grow, since 500 years ago there were many settled human communities in the Americas. If so, later migrants would then be less important, since they would contribute proportionally very little to the genetic makeup of the population.
In summary, our simple bottleneck model probably captures some fundamental element of the population history of North American D. simulans. Assuming that ancestral populations were close to mutation-drift equilibrium, a simple bottleneck model can, at least qualitatively, account for three essential features of the Californian D. simulans data: (1) a genome-wide increase in levels of linkage disequilibrium; (2) more linkage disequilibrium on the X than on the autosomes; and (3) a skew in the frequency spectrum toward more common variants on the X relative to the autosomes.
However, this does not necessarily mean that a bottleneck is a sufficient explanation for the patterns of variation in the data analyzed in this article. The effect that a bottleneck has on levels of diversity, linkage disequilibrium, and the frequency spectrum is quite sensitive to many unknown parameters. Exploratory simulations suggest that decreasing (1 - 
) or T0 (while keeping the other parameters constant) leads to a greater increase in linkage disequilibrium, while decreasing nr leads to more of an effect on the X relative to the autosomes. Also, if the current effective population size and T0 are larger, there is little effect on levels of linkage disequilibrium. For example, if the current N is 1 x 109, then T0 must be quite small (e.g., T0
4 x 103 generations) for a bottleneck to have an appreciable effect on estimates of linkage disequilibrium (results not shown).
Perhaps more worrisome is the fact that the ratio of effective sizes for the X and the autosomes in the ancestral population (nr) must be low (i.e.,
0.75) to be consistent with the observed ratio of diversities in the Californian population (i.e.,
0.69, the approximate upper bound for 
x/
a). In the bottleneck simulations we present (Table 2, Fig 5), we assume 0.6
nr
0.7. In other words, we assume that the male effective population size is greater than or equal to the female effective population size. This situation may be unlikely for Drosophila where sexual selection is expected to reduce the effective population size of males relative to females (![]()
![]()
![]()
![]()
![]()

= 0.75, then 
x/
a = 0.684.
Positive selection models:
An alternative to a purely demographic explanation is that natural selection for adaptation has influenced the observed patterns of variation. D. simulans originated in Africa (![]()
![]()
![]()
We chose the simple recurrent sweep model partly because it has been carefully studied before (e.g., ![]()
![]()
![]()
Another concern is the frequency of selective sweeps. We have chosen simulation parameters that allow few overlapping sweeps. We calculate [similar to (6) in BRAVERMAN et al. 1995] that the probability that a second selective sweep starts before a given one has finished is >0.05 for values of Nx
2.4 x 105 and Na
1.5 x 105. Most of these overlaps consist either of new beneficial alleles arising after an older beneficial allele has already swept to high frequency (but not fixed) and/or two beneficial alleles that are not tightly linked to each other; in both cases, the two sweeps are essentially independent. In general, if s and 
are fixed, then multiple sweeps are more likely to overlap as N decreases. This happens because a selective sweep with a given value of s has an effect on standing levels of linked neutral diversity that is only weakly dependent on N, while sweeps take longer (in units of scaled time) in smaller populations, so are more likely to overlap. Note that we fixed 
so that the effect of selection would be comparable across different values of N. If instead we were to fix the rate of introduction of advantageous alleles, then there would be more sweeps as N increases, and 
would decrease with increasing N; because we have no prior knowledge regarding
r, this implementation does not seem to be appropriate. Since our goal is to determine whether recurrent selective sweeps can produce the excess of linkage disequilibrium that is observed (given the proposed reduction in X-linked vs. autosomal diversity), the relevant question is whether larger values of Nx and Na are compatible with the data. The answer to this question is still no; Fig 4 shows that R* is very small when both Nx and Na are large (i.e., when the nonoverlapping sweep assumption is met).
The problem of overlapping sweeps might be exacerbated if the rate of selective events over time is not constant or the strength of selection is weaker. The general effects of a recurrent selective sweep model on the frequency spectrum and levels of linkage disequilibrium are not very sensitive to s, as long as s
0.002 (results not shown). However, for smaller selection coefficients (e.g., s < 0.002) and the small population sizes considered here, the simple selective sweep model becomes inappropriate due to the large number of overlapping selective events. Also, if natural selection is being driven by adaptation to new environments, then the rate of introduction of favorable alleles might depend heavily on the location and movement of populations and would be much higher at some times than at others. Without any independent source of information on the relevant parameters, we have no idea how often selective sweeps may have overlapped and interfered with each other over time. We also have no idea how multiple competing sweeps (perhaps in a subdivided population) affect levels of variation, the frequency spectrum, or patterns of linkage disequilibrium, or for that matter how sweeps in a subdivided population behave. For any of these models to be viable explanations of the data, they would need to increase levels of linkage disequilibrium on both chromosomes (though much more on the X than the autosomes). They would also need to be able to cause a decrease in levels of variability (on the X) without causing a skew in the frequency spectrum toward rare variants. This seems unlikely unless many of the sweeps are ongoing. Further work will explore how such models affect patterns of sequence polymorphism.
Another possibility is that adaptive evolution operated on standing variation, instead of newly arising mutations. If so, the rate of adaptation on the X might actually be slower than the rate on the autosomes (![]()
Finally, natural selection might operate in a way that is fundamentally different from the simple directional selection models discussed above. However, ![]()
![]()
Conclusions:
Any evolutionary model that seeks to be a sufficient explanation for the North American D. simulans data must simultaneously be consistent with the observed levels of diversity, frequency spectra, and levels of linkage disequilibrium on the X and autosomes. A simple bottleneck model can do so, but only if nr
0.75 and the population size reduction was severe and recent. It is not clear how reasonable these conditions are. On the other hand, a simple hitchhiking model can be rejected because it is inconsistent with both the observed frequency spectra and levels of linkage disequilibrium.
The relative role of natural selection in shaping patterns of D. simulans genetic variation remains unknown. More work needs to be done to explore how other models of natural selection affect patterns of variability. These models might examine, e.g., adaptation in structured populations, natural selection in variable environments (cf., ![]()
![]()
![]()
It will be much easier to test D. simulans evolutionary models once sequence polymorphism data from other (predominantly African) populations are gathered. These data might allow one to infer whether migration to the Americas occurred primarily from Europe or from Africa and would help us construct a reasonable demographic null model. Only by explicitly considering demography will we be able to start deciphering the contribution of natural selection for adaptation to different populations of D. simulans.
| FOOTNOTES |
|---|
2 Present address: Max Planck Institute for Evolutionary Anthropology, D-04103 Leipzig, Germany. ![]()
| ACKNOWLEDGMENTS |
|---|
We thank B. Charlesworth and two anonymous reviewers for helpful suggestions on an earlier version of this manuscript. J.D.W. and M.P. were supported by National Science Foundation Postdoctoral Fellowships in Bioinformatics. P.A. was supported by a European Molecular Biology Organization Postdoctoral Fellowship.
Manuscript received December 10, 2001; Accepted for publication June 12, 2002.
| LITERATURE CITED |
|---|



) P > 0.05; () 0.01 < P < 0.05; (
) P < 0.01. See METHODS for details. The cutoffs for the different shading categories in b were chosen so that the appearances of the two figures were as similar as possible. (

