Abstract
A correlation between diversity levels and rates of recombination is predicted both by models of positive selection, such as hitchhiking associated with the rapid fixation of advantageous mutations, and by models of purifying selection against strongly deleterious mutations (commonly referred to as “background selection”). With parameter values appropriate for Drosophila populations, only the first class of models predicts a marked skew in the frequency spectrum of linked neutral variants, relative to a neutral model. Here, we consider 29 loci scattered throughout the Drosophila melanogaster genome. We show that, in African populations, a summary of the frequency spectrum of polymorphic mutations is positively correlated with the meiotic rate of crossing over. This pattern is demonstrated to be unlikely under a model of background selection. Models of weakly deleterious selection are not expected to produce both the observed correlation and the extent to which nucleotide diversity is reduced in regions of low (but nonzero) recombination. Thus, of existing models, hitchhiking due to the recurrent fixation of advantageous variants is the most plausible explanation for the data.
IT has been known for a decade that levels of nucleotide diversity in Drosophila melanogaster, but not levels of divergence with its sibling species, are correlated with crossing over rates, a pattern inconsistent with a strictly neutral model of molecular evolution (Aguadéet al. 1989; Berryet al. 1991; Begun and Aquadro 1992; Aquadroet al. 1994). A similar pattern has emerged in a variety of organisms (Stephan 1994; Nachman 1997; Nachmanet al. 1998; Stephan and Langley 1998; Przeworskiet al. 2000). Explanations for the correlation include positive selection (Maynard Smith and Haigh 1974; Kaplanet al. 1989; Gillespie 1997, 2000) as well as selection against strongly deleterious mutations (Charlesworth et al. 1993, 1995; Hudson and Kaplan 1995). These explanations assume that the sampled variation is neutral and reflects the action of selection at linked sites.
Particular theoretical attention has been paid to a model of recurrent but nonoverlapping episodes of positive selection, in which neutral variants linked to a strongly favored mutation are swept to fixation in the population. This so-called “hitchhiking” model predicts that loci in regions of low recombination will harbor lower levels of variation and more low frequency polymorphisms than loci in regions of normal recombination (Kaplanet al. 1989; Bravermanet al. 1995; Gillespie 2000). This occurs because the physical length of the hitchhiking region depends on the strength of selection relative to the recombination rate (Kaplanet al. 1989). If selection coefficients are similar across the genome, loci in regions of low recombination will be affected by more selective sweeps per unit time, and hence are more likely to be sampled shortly after a sweep. In the recovery phase, most polymorphisms will be young (i.e., postsweep) and at low frequency.
An alternative to positive selection models is purifying selection against strongly deleterious mutations, hereafter referred to as “background selection” (Charlesworth et al. 1993, 1995; Hudson and Kaplan 1994, 1995). In this model, a neutral allele will persist in the population only if it finds itself on a deleterious mutation-free chromosome (or segment of chromosome), either when it first arises in the population or by recombination. If selection coefficients and deleterious mutation rates are the same in different regions, the rate of recombination will determine the extent of the reduction in neutral diversity (i.e., the extent to which neutral alleles can escape from background selection).
While purifying selection undoubtedly occurs, uncertainty about key parameters, such as the distribution of selection coefficients and the deleterious mutation rate, renders unclear its importance in reducing levels of variability. Similar uncertainty exists for positive selection models. As a result, considerable debate has revolved about the relative importance of background selection and hitchhiking in shaping patterns of variability.
While positive selection models predict an excess of rare alleles at linked neutral sites relative to a model of no selection (Bravermanet al. 1995; Gillespie 2000), the background selection model does not, so long as the population is large and the deleterious mutation rate not extremely high (Hudson and Kaplan 1994; Charlesworthet al. 1995). In D. melanogaster, these two conditions are likely to be met (Liet al. 1999; McVean and Vieira 2001). Thus, polymorphism data from D. melanogaster provide a means to distinguish between models. Two previous surveys of loci in D. melanogaster did not detect a skew toward rare variants in regions with low rates of crossing over (Begun and Aquadro 1993; Charlesworthet al. 1995). These observations suggested that background selection might be a sufficient explanation for the correlation between diversity levels and recombination rates, i.e., that there exists no unequivocal evidence for positive selection.
The effects of background selection and hitchhiking on the frequency spectrum of linked neutral sites are known for a random-mating population of constant size. Demographic departures from these model assumptions can alter the signature of selection (e.g., see Nordborg 1997; Slatkin and Wiehe 1998; Hamblin and Di Rienzo 2000). Recent multilocus surveys suggest that patterns of polymorphism in D. melanogaster do not conform to the expectations of a panmictic population of constant size (Andolfatto and Przeworski 2000; Andolfatto 2001; Przeworskiet al. 2001). In particular, the species probably has an African origin and has only recently become cosmopolitan (David and Capy 1988; Lachaiseet al. 1988). There is also evidence that populations are geographically structured (e.g., Hale and Singh 1991; Begun and Aquadro 1993). Given that African populations are thought to be ancestral, they are likely to have a more stable evolutionary history than non-African populations. As a result, they may be closer to the demographic assumptions of population genetic models; i.e., the effective population size is likely to be large and may not have changed much in size over time. With these considerations in mind, we revisit the relationship between frequency spectra and crossing over rates by focusing, where possible, on samples from single African populations. Our goal is to establish whether background selection alone can account for patterns of nucleotide variation.
MATERIALS AND METHODS
Data collection and previously published data: We collected sequence polymorphism data for 10 loci distributed over a range of crossing over rates on the X chromosome. We concentrate on loci in regions with lower than average crossing over rates, which are underrepresented in published data. Primers used to PCR amplify and sequence these loci are listed in Table 1. Genomic DNA was prepared (Gentra Systems, Research Triangle Park, NC) from one male fly for each of 13 isofemale lines kindly provided by C.-I Wu. These lines are sampled from a Zimbabwe, Africa population (cf. Begun and Aquadro 1993). All 10 loci (Table 1) encode 600-700 bp of intron sequence (except zeste, which is 3′ untranslated DNA) and were sampled from the same 13 individuals. PCR products were sequenced directly on both strands using a Big-Dye sequencing kit and run on an ABI377XL automated sequencer (Perkin-Elmer-Applied Biosystems, Norwalk, CT). Sequence contigs were managed with Sequencher 3.0 software (Gene Codes, Ann Arbor, MI) and aligned manually. Aligned sequences were deposited into GenBank under accession nos. AF345663-AF345791.
From published data, we include the 19 loci available for African populations with sample sizes greater than three. For the majority of these loci (12), population samples were drawn from a Zimbabwe population (as above) with the exceptions of Fbp2, Su(H), and Vha68, where samples are from an Ivory Coast population; Acp29AB, from a Malawi population; and Dras1, Dras2, and R, from a number of African populations. CecC was chosen as a representative from a cluster of closely related genes (cf. Clark and Wang 1997).
For each locus, we include all polymorphisms at synonymous sites and in noncoding DNA (including both insertion-deletions and single nucleotide polymorphisms). Single nucleotide polymorphisms within deletions and overlapping deletions were excluded from analyses. Excluding insertion-deletion variation has no effect on the conclusions (results not shown). In situations where a nucleotide polymorphism occurs immediately adjacent to a deletion, and the two are in complete coupling, the two polymorphisms are treated as a single event. For these reasons, the numbers reported in tables may differ slightly from previously published values. DnaSP3.0 (Rozas and Rozas 1999) was used for a subset of polymorphism analyses.
Estimating rates of crossing over: For each locus, we estimate r, the sex-averaged local rate of meiotic crossing over per kilobase per generation, following the approach of Charlesworth (1996). In this method, each chromosomal arm is divided into several regions. Each region was assigned either a linear or a quadratic function relating genetic distance to physical distance. We modified this method to incorporate estimates of the DNA content of each cytological band (Heinoet al. 1994). Thus, the rate of crossing over is expressed in units of centimorgans per million bases (cM/Mb) rather than centimorgans per band (cM/band). Since there is no crossing over in D. melanogaster males (Ashburner 1989), female rates of crossing over are multiplied by ½ for autosomes and ⅔ for X-linked loci.
Previous approaches have fit high-order polynomial curves to whole chromosome arms using standard map distances available in the databases (e.g., Ashburner 1989, chapter 11; Comeronet al. 1999). These approaches produce estimates of r that are highly correlated with our estimates (e.g., for our 29 loci, the correlation with estimates from Comeronet al. 1999 yields a Pearson correlation coefficient of R = 0.91). The main difference between approaches is the estimates of r in regions of very low crossing over, i.e., telomeric and centromeric regions of D. melanogaster. Using Charlesworth’s method, we recover yellow-su(s), yellow-su(wa), and yellow-white map distance estimates (∼10-4, 10-3, and 0.015 M, respectively) that are in good agreement with published measurements (cf. Takano-Shimizu 1999; Langleyet al. 2000).
For the white locus (cytological map 3C2, genetic map 1.5), the appropriate r estimate is less clear. Charlesworth (1996) chooses this locus to mark the transition from a region with a quadratic relation of genetic to physical distance (from the telomere) to one with a linear relation in the centromere-proximal direction. We take the r predicted by the telomere-proximal quadratic function (0.6 cM/Mb). Note, however, that genetic distance in the centromere-proximal direction from white appears to increase faster than quadratically. If, instead, we assume an exponential increase in genetic distance from yellow (cytological map 1B1, genetic map 0) to Notch (cytological map 3C7-9, genetic map 3.0), we estimate an r for white that is ∼2.5-fold larger. The use of this r estimate for white has no effect on our conclusions (results not shown).
PCR and sequencing primers for new X-linked loci used in this study
Summary of the frequency spectrum of mutations: To summarize the frequency spectrum, we use the statistic D (Tajima 1989c). D considers the (approximately normalized) difference between two estimates of the population mutation rate, θ: π, the mean number of pairwise differences in the sample, and θW, an estimate based on the number of segregating mutations in the sample (Watterson 1975). Under the standard neutral model, the expectation of D is roughly zero (Tajima 1989c). D is positive when there are too many intermediate frequency variants, such as in the presence of a balanced polymorphism, or under certain population subdivision models (Tajima 1989a). D is negative when there is an excess of rare mutations, as would be expected under recent population growth (Tajima 1989b), or under hitchhiking models (Bravermanet al. 1995; Gillespie 2000). For each locus, Tajima’s D was calculated on the basis of the total number of mutations rather than on the number of segregating sites. This approach is conservative since it tends to make the D values of loci in regions of high recombination (and greater diversity) more negative. This said, the number of multiply hit sites is very small.
We also report a second summary of the frequency spectrum, D*, proposed by Fu and Li (1993). Similarly to D, D* compares two estimates of θ: θW and an estimate based on the number of mutations found only once in a sample of chromosomes (referred to as singletons). The expectation of D* is approximately zero under the neutral model; it is negative when there is an excess of singletons in the sample (Fu and Li 1993).
Coalescent simulations: To perform weighted regressions (see results), we estimated the variance of D under the null model for each locus. To do so, we ran 104 coalescent simulations for a random-mating population of constant size with no selection (cf. Hudson 1990). Every new mutation occurs at a previously unmutated site. We implemented the coalescent for a fixed number of segregating sites (Hudson 1993); i.e., we generated a genealogy, then placed the observed number of segregating sites on the tree. For each locus, we conditioned on the sample size the total number of base pairs sequenced and an estimate of the population recombination rate C = 4Nr (or 3Nr if X-linked, assuming no sex differences and equal sex ratios), where r is the sex-averaged rate of crossing over per base pair per generation. N, the diploid effective population size of the species, is taken to be 106 (Liet al. 1999; Andolfatto and Przeworski 2000). The exact value of N is not important, since we are mainly interested in the relative variance of D for loci in regions with different crossing over rates. Note finally that weighted regression ignores error in the estimation of r.
To estimate the variance of D under a model of background selection, we ran simulations as before, but conditional on C = f4Nr (or f 3Nr for X-linked loci), where the f value for each locus is the expected local reduction in effective population size due to background selection, as estimated according to Charlesworth (1996). We use the values of f that provided the best fit to the available restriction fragment length polymorphism data, when both transposable elements and point mutations are considered (Charlesworth 1996).
We also ran coalescent simulations to estimate the distribution of the Pearson correlation coefficient R. For a model of a single population of constant size, simulations are implemented as above. We generated D values for the 29 loci 104 times, and calculated R each time. We then tabulated the number of runs with simulated R greater than or equal to the observed one. We also estimate the distribution of R for a model of population growth and a model of population structure. We model population growth as a constant population size of 105 until 105 generations ago, followed by an exponential increase to a present size of 107 (cf. Marjoram and Donnelly 1994). Other aspects of the simulation are as above. Our model of geographic subdivision is a symmetric two-island model (e.g., Tajima 1989a), where samples are drawn entirely from one deme. The number of individuals in each subpopulation, Npop, is taken to be N/2. We considered a fairly extreme model of structure, where the number of migrants per island per generation is one. This value is lower than suggested by most FST values observed in D. melanogaster (Irvinet al. 1998). Simulations are run using modifications of programs kindly provided by R. R. Hudson.
RESULTS
Summaries of diversity and of the frequency spectrum of mutations are positively correlated with rates of crossing over: In Table 2, we summarize patterns of nucleotide variation at synonymous sites and noncoding DNA for 10 newly sequenced loci and 19 previously published data sets. To allow for comparisons between X and autosome diversity levels, a point of departure is to multiply X-linked diversities by 4/3 (e.g., Begun and Aquadro 1992); this approach assumes equal sex ratios and no sexual selection. In Figure 1, we plot the relationship between the corrected π, the average number of pairwise differences per site, and the estimated rates of crossing over per physical distance. As expected (Begun and Aquadro 1992; Aquadroet al. 1994), nucleotide diversity is positively correlated with crossing over rates (Pearson’s R = 0.61, P = 0.001; Spearman’s R = 0.66, P < 0.001; all reported P values are two-tailed). It is unclear, however, whether this scaling is appropriate. Sexual selection in natural populations can cause the scaling of X to autosome diversities to be closer to unity (Caballero 1995). If diversities in Table 2 are left uncorrected, the correlation is stronger (Pearson’s R = 0.65, P < 0.001; Spearman’s R = 0.71, P < 0.001).
In Figure 2, we plot the values of Tajima’s D and Fu and Li’s D* against rates of crossing over for the 29 loci in Table 2. Values of D are sharply negative in areas of low crossing over and increase with increasing rates of crossing over (Pearson’s R = 0.56, P = 0.002; Spearman’s R = 0.54, P = 0.002). The correlation between crossing over and D* is even less likely to occur by chance (Pearson’s R = 0.67, P < 0.001; Spearman’s R = 0.64, P < 0.001). These patterns are expected under a model of hitchhiking associated with the rapid fixation of advantageous mutations (Bravermanet al. 1995; Gillespie 2000) and under some random-environment-selection models (Gillespie 1997), but not under the background selection model (Hudson and Kaplan 1994; Charlesworthet al. 1995). A significant correlation is also obtained when the analysis is restricted to the 22 loci sampled only in Zimbabwe (for D, Pearson’s R = 0.53, P = 0.011; Spearman’s R = 0.51, P = 0.016; while for D*, Pearson’s R = 0.66, P = 0.001; Spearman’s R = 0.58, P = 0.004).
One concern is that loci in areas of very low crossing over may not represent independent data points. However, considering these loci as a single data point has no effect on our conclusions. As an illustration, if we replace the D values for the nine X-linked loci with a crossing over rate of ≤5 × 10-6/kb/generation by one value (either the average D or the total D), the correlation is still highly significant (P < 0.01).
Also relevant is the presence of inversion polymorphisms in African populations of D. melanogaster (Lemeunier and Aulard 1992). When heterozygous, inversions suppress recombination locally and can increase crossing over rates on other chromosomes (Schultz and Redfield 1951; Sniegowskiet al. 1994). These features make our estimates of the crossing over rate less certain. In particular, because inversion polymorphisms are more common on autosomes than on the X chromosome, estimates of crossing over rates for autosomes could be systematically biased upward relative to those on the X. However, the correlation of D (or D*) and crossing over rates is still significant if autosomal crossing over rates are reduced by one-half (for D, P < 0.02; and for D*, P < 0.002). The main effect of inverted chromosomes should be to introduce additional noise (Andolfattoet al. 2001). In fact, the correlation is improved if inverted chromosomes are excluded where possible [In(2L)t, Fbp2, Su(H), and Vha, results not shown]. In this light, it is worth noting that the correlation between diversity and crossing over is stronger for X-linked loci (Pearson’s R = 0.90, 15 loci), where there are no high frequency inversions, than for autosomal loci (Pearson’s R = 0.39, 14 loci).
Sensitivity to demographic assumptions: Our null model assumes a random-mating population of constant size. Population structure will alter the distribution of D relative to panmixia, even if samples are drawn from a single deme (see Wall 1999). When samples are drawn from one island in a symmetric island model, the mean D is slightly increased, and there is more weight on both tails of the distribution of D (Wall 1999; M. Przeworski, unpublished results). To test the significance of the correlation between D and r in the presence of structure, we ran simulations of a symmetric two-island model for 4Npopm = 1, where Npop is the number of individuals in each population and m is the fraction of migrants per generation. Sampling was entirely from one deme. We ran 104 trials to estimate the distribution R under this null model (see materials and methods). Only 9 of the 104 trials had R ≥ 0.56 (the observed value). This model of population subdivision is a highly simplified one; nonetheless, the results suggest that our correlation is no less unusual in the presence of population structure.
Inequality of variance among samples and the background selection hypothesis: Standard correlation tests are not entirely appropriate, since under the null model the expected variance in D values increases with decreasing recombination (Hudson 1983; Wall 1999; Przeworskiet al. 2001). In addition, samples from regions of low recombination tend to contain fewer segregating sites (see Table 2). With this in mind, we performed a weighted regression, weighting each observed D value by the reciprocal of the estimated variance of D for each locus (see materials and methods). When the inequality of variances among samples is taken into account in this way, P = 0.01 (R = 0.47).
Background selection against strongly deleterious alleles in large populations can be thought of as a simple reduction in effective population size (Hudson and Kaplan 1994, 1995), so should not result in a distortion of the frequency spectrum. However, by reducing C, background selection will further increase the variance of D. We therefore performed a second weighted regression. This time, the variance of D was estimated from simulations that condition on C = f4Nr (or f 3Nr for X-linked loci), where f for each locus is the expected local reduction in effective population size due to background selection, as estimated by Charlesworth (1996). Under this model, the probability of observing such a strong positive correlation between D and r (R = 0.48) by chance is estimated to be P < 0.01. Thus, background selection alone is an unlikely explanation for the correlation.
Patterns of variability and local rates of crossing over for 29 loci in African populations of D. melanogaster
—Relationship between silent nucleotide diversity (π%) and the rate of crossing over for 29 loci sampled from African populations of D. melanogaster. Shaded circles represent X-linked loci. X-linked diversities (Table 2) have been multiplied by 4/3 to make them comparable to autosomal diversities, assuming equal sex ratios and no sexual selection. There is a significant, positive correlation (Pearson’s R = 0.61, P = 0.001, Spearman’s P < 0.001, two-tailed).
Effects of varying sample size: We have shown that if we assume a random-mating population of constant size [where E(D) ≈ 0], the correlation between D and crossing over rates is highly unlikely. Next, we consider whether the correlation between D and r might reflect negative D values at all loci. This is a concern because when the population D is not zero, the sample estimate of D is no longer unbiased. In particular, small samples will yield estimates closer to zero than will large ones (cf. Gillespie 2000). In our data, there is a marginally significant negative correlation between sample size (n) and crossing over rates (Pearson’s R = -0.344, two-tailed P = 0.07). Suppose that population D values are all nonzero, for example, due to population expansion. Might the unequal sample sizes result in a correlation between diversity and crossing over rates? To test this possibility, we randomly subsampled 10 alleles for each of the two data sets with n = 50. For this version of the data, D values are -1.35 for su(s) and 0.55 for su(wa) and all sample sizes are <20. The correlation between n and estimated crossing over rates has all but disappeared (Pearson’s R = -0.13, two-tailed P = 0.51), yet there is still a significant correlation between D and crossing over rates (Pearson’s R = 0.48, two-tailed P < 0.01). Thus, the correlation between D and r observed in these samples is likely to reflect an underlying correlation in population D values (rather than nonzero population D values at all loci).
Whether we even expect the negative correlation between n and r to produce a spurious correlation between D and r is unclear. While loci in areas of high crossing over tend to have smaller sample sizes, they also tend to have many more polymorphic sites. This follows from the strong correlation observed between diversity and rates of crossing over in our data (Figure 1). Also, the variance of D under the null model is much smaller for loci in regions of high recombination than it is for loci in regions of low recombination (Przeworskiet al. 2001). To explore this further, we investigated whether a positive correlation is expected for a model of population growth. Specifically, we estimated the distribution of Pearson’s R for the 29 loci for a model of constant population size (N = 105) followed by 105 generations of exponential growth to a present size of 107. This model is not intended as a realistic depiction of the population history of D. melanogaster but as an illustration. With these parameters, for n = 10 and 10 segregating sites, the mean D is ∼ -0.80, while for n = 50 and 50 segregating sites, the mean D is ∼ -1.76 (as estimated from 10,000 simulations). The average R value in 104 simulations of the 29 loci was +0.08 (i.e., significantly greater than 0). However, the Pr(R ≥ 0.56) was estimated to be 0.001; in other words, the correlation that we observe is still highly unlikely. While this is only one model among many, it suggests that the effect of varying sample sizes on R is quite weak when other facets of the data are also taken into account.
—Scatterplot of (a) Tajima’s D and (b) Fu and Li’s D* vs. the rate of crossing over for 29 loci sampled from African populations of D. melanogaster. Shaded circles represent X-linked loci. D vs. crossing over: Pearson’s R = 0.56, P = 0.002, Spearman’s P = 0.002, two-tailed. D* vs. crossing over: Pearson’s R = 0.67, P < 0.001, Spearman’s P < 0.001, two-tailed. The correlation with D is improved by excluding insertion-deletion variation (Pearson’s R = 0.59).
DISCUSSION
Models of positive selection: The positive correlation between D and crossing over rates is unlikely under a simple model of background selection but is expected under simple recurrent hitchhiking models (Bravermanet al. 1995; Gillespie 2000). As one illustration, the expected D value in Gillespie’s (2000) pseudohitchhiking model is ∼ -1.5 for regions of no recombination and samples of 20 drawn from “large” populations (N ≥ 104). Thus, hitchhiking due to the rapid fixation of advantageous variants appears to be the best explanation for the data. Patterns of polymorphism may also be consistent with random-environment-selection models (Gillespie 1997), but their sampling properties have yet to be investigated explicitly.
Also of interest are models that incorporate both deleterious and favorable mutations, as both processes are likely to be occurring simultaneously (Hudson 1994). Such models are at their infancy. The sole study of the joint effects of hitchhiking and background (Kim and Stephan 2000) concludes that hitchhiking is likely to be the main force shaping patterns of variation in regions of low recombination, while background selection may dominate when recombination rates are higher. Thus, the correlation between D and r reported here may be consistent with the joint action of these forces.
Alternative deleterious evolution models: Next, we consider whether alternative models of deleterious evolution might also be consistent with the patterns reported in Figures 1 and 2. A large class of deleterious mutations may have selection coefficients smaller than values typically used in models of background selection (Fryet al. 1999; Keightley and Eyre-Walker 1999). Such selection intensities may be particularly relevant to transposable elements (cf. Charlesworth 1996) as well as to a subset of synonymous sites and amino acid variants (Akashi 1999). While selection against strongly deleterious mutations does not result in a skewed frequency spectrum at linked neutral sites (Hudson and Kaplan 1994), selection against weakly deleterious mutations can lead to an excess of rare neutral alleles in large samples (e.g., for a sample size of 100 chromosomes, Charlesworthet al. 1995). Whether a skew would also be detectable in small samples (our median sample size is 13) is unknown. In addition, the parameters (selection coefficient and deleterious mutation rate) required to produce a skew in the frequency spectrum of linked neutral variants are unlikely to cause the drastic reduction of nucleotide diversity observed in regions of low recombination in D. melanogaster (Charlesworthet al. 1995; Charlesworth 1996; Nordborget al. 1996).
When purifying selection is weak, linked deleterious alleles may persist long enough in a population to cosegregate. Selection at one site can then reduce the efficacy of selection at linked sites (Hill and Robertson 1966; Felsenstein 1974), a phenomenon termed Hill-Robertson interference (HRI). Weak selection HRI models (e.g., Li 1987; Comeronet al. 1999; McVean and Charlesworth 2000; Tachida 2000) can produce a negative skew in the frequency spectrum of linked neutral variants under some parameters (Tachida 2000). However, while HRI models produce a reduction in diversity levels, they cannot explain diversity reductions on the scale of those observed in areas of low (but nonzero) crossing over in D. melanogaster (Tachida 2000; G. McVean, personal communication).
In addition, the effect of weak selection HRI is probably weaker than has been estimated in simulations, as published models (e.g., McVean and Charlesworth 2000) include crossing over but not gene conversion. Gene conversion may be the prevalent mechanism determining the strength of associations among segregating mutations in regions of very low crossing over in Drosophila (Langleyet al. 2000), and thus the strength of HRI. Preliminary simulations suggest that the effect of HRI on the frequency spectrum and diversity levels at linked neutral variants is considerably weakened by the presence of gene conversion in regions of nonzero crossing over (J. Comeron, personal communication; G. McVean, personal communication).
The models described thus far have considered the effect of selection on linked neutral variation. However, sampled variation may not be neutral, but may itself be under very weak selection (i.e., on the order of 1/N). In this case, a negative D value is expected for all rates of recombination (McVean and Charlesworth 2000). In addition, only a minor reduction in variability is expected for regions of low recombination relative to regions of high recombination (e.g., see Figure 3 of McVean and Charlesworth 2000).
In summary, existing models of deleterious evolution, weak or strong, are unlikely to account for both the reductions in diversity and the skew in D observed in small samples from regions of low crossing over. However, models with a distribution of selection coefficients have yet to be investigated. Recent work on the selective effects of deleterious mutations suggests a large class of deleterious mutations with minor effects in both Drosophila and Caenorhabditis elegans (Davieset al. 1999; Keightley and Eyre-Walker 1999). The challenge is to construct a more realistic model of deleterious evolution (including a distribution of selection coefficients, crossing over, and gene conversion) that is constrained by plausible estimates of the deleterious mutation rate (McVean and Vieira 2001), yet can account for the pattern reported here.
The potential importance of demographic factors: In this study, we focused on African samples of D. melanogaster. We did so because non-African populations are likely to have experienced a more complicated demographic history (David and Capy 1988; Lachaiseet al. 1988). In this light, it is noteworthy that no significant correlation has been found between D and crossing over rates in the available data for non-African populations (cf. Aquadroet al. 1994; Charlesworthet al. 1995). The reasons for the discrepancy between African and non-African populations are not understood. One possibility is that a recent demographic perturbation (e.g., founder effects) in the history of non-African populations (David and Capy 1988; Begun and Aquadro 1993, 1995) obscures the pattern of natural selection. Long-term population structure (e.g., Slatkin and Wiehe 1998; Hamblin and Di Rienzo 2000) could also alter the signature of selection relative to the predictions of existing models.
Species-specific demographic considerations are likely to be of relevance for evolutionary inferences from other species as well. A large quantity of polymorphism data will soon be available for humans. Like Drosophila, anatomically modern humans may have an African origin and have only recently become cosmopolitan (Cannet al. 1987). Preliminary data suggest a correlation between diversity levels and crossing over rates (Nachmanet al. 1998; Przeworskiet al. 2000). By carefully considering the demographic history of populations, it may be possible to recover a similar pattern in the human data.
CONCLUSIONS
A decade of empirical work has been devoted to distinguishing between a simple model of background selection (Charlesworthet al. 1993; Hudson and Kaplan 1995) and a simple model of positive selection (Maynard Smith and Haigh 1974; Kaplanet al. 1989). Here we have shown that both levels of diversity and a summary of the frequency spectrum of mutations, D, are positively correlated with rates of crossing over in African populations of D. melanogaster. This multilocus pattern is unlikely under the first model and is expected under the second. Together with an increasing number of examples of selective sweeps at individual loci in Drosophila and other organisms (e.g., Schlöttereret al. 1997; Chenet al. 2000; Hamblin and Di Rienzo 2000; Nachman and Crowell 2000; Yi and Charlesworth 2000), our results suggest that positive Darwinian selection is a common force shaping patterns of variability in natural populations.
Acknowledgments
We thank J. Comeron, R. Hudson, T. Johnson, M. S. McPeek, and M. Stephens for helpful discussions and B. Charlesworth, D. Charlesworth, P. Donnelly, S. Otto, J. Pritchard, and J. Wall for comments on the manuscript. R. Hudson pointed out that standard correlation tests might not be appropriate in this context. F. Depaulis, C. Langley, and S.-C. Tsaur kindly shared data prior to its publication. P.A. is supported by a European Molecular Biology Organization postdoctoral fellowship. M.P. is supported by a National Science Foundation postdoctoral fellowship in Bioinformatics.
Footnotes
-
Communicating editor: N. Takahata
- Received November 7, 2000.
- Accepted February 3, 2001.
- Copyright © 2001 by the Genetics Society of America