Genetics, Vol. 158, 657-665, June 2001, Copyright © 2001

Regions of Lower Crossing Over Harbor More Rare Variants in African Populations of Drosophila melanogaster

Peter Andolfattoa and Molly Przeworskib
a Institute of Cell, Animal and Population Biology, University of Edinburgh, Edinburgh, EH9 3JT, United Kingdom
b Department of Statistics, Oxford University, Oxford, OX1 3TG, United Kingdom

Corresponding author: Peter Andolfatto, Institute for Cell and Animal Population Biology, Ashworth Labs, Kings Bldgs., University of Edinburgh, Edinburgh, EH9 3JT, United Kingdom., peter.andolfatto{at}ed.ac.uk (E-mail)

Communicating editor: N. TAKAHATA


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*CONCLUSIONS
*LITERATURE CITED

A correlation between diversity levels and rates of recombination is predicted both by models of positive selection, such as hitchhiking associated with the rapid fixation of advantageous mutations, and by models of purifying selection against strongly deleterious mutations (commonly referred to as "background selection"). With parameter values appropriate for Drosophila populations, only the first class of models predicts a marked skew in the frequency spectrum of linked neutral variants, relative to a neutral model. Here, we consider 29 loci scattered throughout the Drosophila melanogaster genome. We show that, in African populations, a summary of the frequency spectrum of polymorphic mutations is positively correlated with the meiotic rate of crossing over. This pattern is demonstrated to be unlikely under a model of background selection. Models of weakly deleterious selection are not expected to produce both the observed correlation and the extent to which nucleotide diversity is reduced in regions of low (but nonzero) recombination. Thus, of existing models, hitchhiking due to the recurrent fixation of advantageous variants is the most plausible explanation for the data.


IT has been known for a decade that levels of nucleotide diversity in Drosophila melanogaster, but not levels of divergence with its sibling species, are correlated with crossing over rates, a pattern inconsistent with a strictly neutral model of molecular evolution (AGUADE et al. 1989 Down; BERRY et al. 1991 Down; BEGUN and AQUADRO 1992 Down; AQUADRO et al. 1994 Down). A similar pattern has emerged in a variety of organisms (STEPHAN 1994 Down; NACHMAN 1997 Down; NACHMAN et al. 1998 Down; STEPHAN and LANGLEY 1998 Down; PRZEWORSKI et al. 2000 Down). Explanations for the correlation include positive selection (MAYNARD SMITH and HAIGH 1974 Down; KAPLAN et al. 1989 Down; GILLESPIE 1997 Down, 2000) as well as selection against strongly deleterious mutations (CHARLESWORTH et al. 1993 Down, 1995; HUDSON and KAPLAN 1995 Down). These explanations assume that the sampled variation is neutral and reflects the action of selection at linked sites.

Particular theoretical attention has been paid to a model of recurrent but nonoverlapping episodes of positive selection, in which neutral variants linked to a strongly favored mutation are swept to fixation in the population. This so-called "hitchhiking" model predicts that loci in regions of low recombination will harbor lower levels of variation and more low frequency polymorphisms than loci in regions of normal recombination (KAPLAN et al. 1989 Down; BRAVERMAN et al. 1995 Down; GILLESPIE 2000 Down). This occurs because the physical length of the hitchhiking region depends on the strength of selection relative to the recombination rate (KAPLAN et al. 1989 Down). If selection coefficients are similar across the genome, loci in regions of low recombination will be affected by more selective sweeps per unit time, and hence are more likely to be sampled shortly after a sweep. In the recovery phase, most polymorphisms will be young (i.e., postsweep) and at low frequency.

An alternative to positive selection models is purifying selection against strongly deleterious mutations, hereafter referred to as "background selection" (CHARLESWORTH et al. 1993 Down, CHARLESWORTH et al. 1995 Down; HUDSON and KAPLAN 1994 Down, 1995). In this model, a neutral allele will persist in the population only if it finds itself on a deleterious mutation-free chromosome (or segment of chromosome), either when it first arises in the population or by recombination. If selection coefficients and deleterious mutation rates are the same in different regions, the rate of recombination will determine the extent of the reduction in neutral diversity (i.e., the extent to which neutral alleles can escape from background selection).

While purifying selection undoubtedly occurs, uncertainty about key parameters, such as the distribution of selection coefficients and the deleterious mutation rate, renders unclear its importance in reducing levels of variability. Similar uncertainty exists for positive selection models. As a result, considerable debate has revolved about the relative importance of background selection and hitchhiking in shaping patterns of variability.

While positive selection models predict an excess of rare alleles at linked neutral sites relative to a model of no selection (BRAVERMAN et al. 1995 Down; GILLESPIE 2000 Down), the background selection model does not, so long as the population is large and the deleterious mutation rate not extremely high (HUDSON and KAPLAN 1994 Down; CHARLESWORTH et al. 1995 Down). In D. melanogaster, these two conditions are likely to be met (LI et al. 1999 Down; MCVEAN and VIEIRA 2001 Down). Thus, polymorphism data from D. melanogaster provide a means to distinguish between models. Two previous surveys of loci in D. melanogaster did not detect a skew toward rare variants in regions with low rates of crossing over (BEGUN and AQUADRO 1993 Down; CHARLESWORTH et al. 1995 Down). These observations suggested that background selection might be a sufficient explanation for the correlation between diversity levels and recombination rates, i.e., that there exists no unequivocal evidence for positive selection.

The effects of background selection and hitchhiking on the frequency spectrum of linked neutral sites are known for a random-mating population of constant size. Demographic departures from these model assumptions can alter the signature of selection (e.g., see NORDBORG 1997 Down; SLATKIN and WIEHE 1998 Down; HAMBLIN and DI RIENZO 2000 Down). Recent multilocus surveys suggest that patterns of polymorphism in D. melanogaster do not conform to the expectations of a panmictic population of constant size (ANDOLFATTO and PRZEWORSKI 2000 Down; ANDOLFATTO 2001 Down; PRZEWORSKI et al. 2001 Down). In particular, the species probably has an African origin and has only recently become cosmopolitan (DAVID and CAPY 1988 Down; LACHAISE et al. 1988 Down). There is also evidence that populations are geographically structured (e.g., HALE and SINGH 1991 Down; BEGUN and AQUADRO 1993 Down). Given that African populations are thought to be ancestral, they are likely to have a more stable evolutionary history than non-African populations. As a result, they may be closer to the demographic assumptions of population genetic models; i.e., the effective population size is likely to be large and may not have changed much in size over time. With these considerations in mind, we revisit the relationship between frequency spectra and crossing over rates by focusing, where possible, on samples from single African populations. Our goal is to establish whether background selection alone can account for patterns of nucleotide variation.


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*CONCLUSIONS
*LITERATURE CITED

Data collection and previously published data:
We collected sequence polymorphism data for 10 loci distributed over a range of crossing over rates on the X chromosome. We concentrate on loci in regions with lower than average crossing over rates, which are underrepresented in published data. Primers used to PCR amplify and sequence these loci are listed in Table 1. Genomic DNA was prepared (Gentra Systems, Research Triangle Park, NC) from one male fly for each of 13 isofemale lines kindly provided by C.-I Wu. These lines are sampled from a Zimbabwe, Africa population (cf. BEGUN and AQUADRO 1993 Down). All 10 loci (Table 1) encode 600–700 bp of intron sequence (except zeste, which is 3' untranslated DNA) and were sampled from the same 13 individuals. PCR products were sequenced directly on both strands using a Big-Dye sequencing kit and run on an ABI377XL automated sequencer (Perkin-Elmer-Applied Biosystems, Norwalk, CT). Sequence contigs were managed with Sequencher 3.0 software (Gene Codes, Ann Arbor, MI) and aligned manually. Aligned sequences were deposited into GenBank under accession nos. AF345663, AF345791.


 
View this table:
In this window
In a new window

 
Table 1. PCR and sequencing primers for new X-linked loci used in this study

From published data, we include the 19 loci available for African populations with sample sizes greater than three. For the majority of these loci (12), population samples were drawn from a Zimbabwe population (as above) with the exceptions of Fbp2, Su(H), and Vha68, where samples are from an Ivory Coast population; Acp29AB, from a Malawi population; and Dras1, Dras2, and R, from a number of African populations. CecC was chosen as a representative from a cluster of closely related genes (cf. CLARK and WANG 1997 Down).

For each locus, we include all polymorphisms at synonymous sites and in noncoding DNA (including both insertion- deletions and single nucleotide polymorphisms). Single nucleotide polymorphisms within deletions and overlapping deletions were excluded from analyses. Excluding insertion-deletion variation has no effect on the conclusions (results not shown). In situations where a nucleotide polymorphism occurs immediately adjacent to a deletion, and the two are in complete coupling, the two polymorphisms are treated as a single event. For these reasons, the numbers reported in tables may differ slightly from previously published values. DnaSP3.0 (ROZAS and ROZAS 1999 Down) was used for a subset of polymorphism analyses.

Estimating rates of crossing over:
For each locus, we estimate r, the sex-averaged local rate of meiotic crossing over per kilobase per generation, following the approach of CHARLESWORTH 1996 Down. In this method, each chromosomal arm is divided into several regions. Each region was assigned either a linear or a quadratic function relating genetic distance to physical distance. We modified this method to incorporate estimates of the DNA content of each cytological band (HEINO et al. 1994 Down). Thus, the rate of crossing over is expressed in units of centimorgans per million bases (cM/Mb) rather than centimorgans per band (cM/ band). Since there is no crossing over in D. melanogaster males (ASHBURNER 1989 Down), female rates of crossing over are multiplied by 1/2 for autosomes and 2/3 for X-linked loci.

Previous approaches have fit high-order polynomial curves to whole chromosome arms using standard map distances available in the databases (e.g., ASHBURNER 1989 Down, chapter 11; COMERON et al. 1999 Down). These approaches produce estimates of r that are highly correlated with our estimates (e.g., for our 29 loci, the correlation with estimates from COMERON et al. 1999 Down yields a Pearson correlation coefficient of R = 0.91). The main difference between approaches is the estimates of r in regions of very low crossing over, i.e., telomeric and centromeric regions of D. melanogaster. Using Charlesworth's method, we recover yellow-su(s), yellow-su(wa), and yellow-white map distance estimates (~10-4, 10-3, and 0.015 M, respectively) that are in good agreement with published measurements (cf. TAKANO-SHIMIZU 1999 Down; LANGLEY et al. 2000 Down).

For the white locus (cytological map 3C2, genetic map 1.5), the appropriate r estimate is less clear. CHARLESWORTH 1996 Down chooses this locus to mark the transition from a region with a quadratic relation of genetic to physical distance (from the telomere) to one with a linear relation in the centromere-proximal direction. We take the r predicted by the telomere-proximal quadratic function (0.6 cM/Mb). Note, however, that genetic distance in the centromere-proximal direction from white appears to increase faster than quadratically. If, instead, we assume an exponential increase in genetic distance from yellow (cytological map 1B1, genetic map 0) to Notch (cytological map 3C7–9, genetic map 3.0), we estimate an r for white that is ~2.5-fold larger. The use of this r estimate for white has no effect on our conclusions (results not shown).

Summary of the frequency spectrum of mutations:
To summarize the frequency spectrum, we use the statistic D (TAJIMA 1989C Down). D considers the (approximately normalized) difference between two estimates of the population mutation rate, {theta}: {pi}, the mean number of pairwise differences in the sample, and {theta}W, an estimate based on the number of segregating mutations in the sample (WATTERSON 1975 Down). Under the standard neutral model, the expectation of D is roughly zero (TAJIMA 1989C Down). D is positive when there are too many intermediate frequency variants, such as in the presence of a balanced polymorphism, or under certain population subdivision models (TAJIMA 1989A Down). D is negative when there is an excess of rare mutations, as would be expected under recent population growth (TAJIMA 1989B Down), or under hitchhiking models (BRAVERMAN et al. 1995 Down; GILLESPIE 2000 Down). For each locus, Tajima's D was calculated on the basis of the total number of mutations rather than on the number of segregating sites. This approach is conservative since it tends to make the D values of loci in regions of high recombination (and greater diversity) more negative. This said, the number of multiply hit sites is very small.

We also report a second summary of the frequency spectrum, D*, proposed by FU and LI 1993 Down. Similarly to D, D* compares two estimates of {theta}: {theta}W and an estimate based on the number of mutations found only once in a sample of chromosomes (referred to as singletons). The expectation of D* is approximately zero under the neutral model; it is negative when there is an excess of singletons in the sample (FU and LI 1993 Down).

Coalescent simulations:
To perform weighted regressions (see RESULTS), we estimated the variance of D under the null model for each locus. To do so, we ran 104 coalescent simulations for a random-mating population of constant size with no selection (cf. HUDSON 1990 Down). Every new mutation occurs at a previously unmutated site. We implemented the coalescent for a fixed number of segregating sites (HUDSON 1993 Down); i.e., we generated a genealogy, then placed the observed number of segregating sites on the tree. For each locus, we conditioned on the sample size the total number of base pairs sequenced and an estimate of the population recombination rate C = 4Nr (or 3Nr if X-linked, assuming no sex differences and equal sex ratios), where r is the sex-averaged rate of crossing over per base pair per generation. N, the diploid effective population size of the species, is taken to be 106 (LI et al. 1999 Down; ANDOLFATTO and PRZEWORSKI 2000 Down). The exact value of N is not important, since we are mainly interested in the relative variance of D for loci in regions with different crossing over rates. Note finally that weighted regression ignores error in the estimation of r.

To estimate the variance of D under a model of background selection, we ran simulations as before, but conditional on C = f4Nr (or f3Nr for X-linked loci), where the f value for each locus is the expected local reduction in effective population size due to background selection, as estimated according to CHARLESWORTH 1996 Down. We use the values of f that provided the best fit to the available restriction fragment length polymorphism data, when both transposable elements and point mutations are considered (CHARLESWORTH 1996 Down).

We also ran coalescent simulations to estimate the distribution of the Pearson correlation coefficient R. For a model of a single population of constant size, simulations are implemented as above. We generated D values for the 29 loci 104 times, and calculated R each time. We then tabulated the number of runs with simulated R greater than or equal to the observed one. We also estimate the distribution of R for a model of population growth and a model of population structure. We model population growth as a constant population size of 105 until 105 generations ago, followed by an exponential increase to a present size of 107 (cf. MARJORAM and DONNELLY 1994 Down). Other aspects of the simulation are as above. Our model of geographic subdivision is a symmetric two-island model (e.g., TAJIMA 1989A Down), where samples are drawn entirely from one deme. The number of individuals in each subpopulation, Npop, is taken to be N/2. We considered a fairly extreme model of structure, where the number of migrants per island per generation is one. This value is lower than suggested by most FST values observed in D. melanogaster (IRVIN et al. 1998 Down). Simulations are run using modifications of programs kindly provided by R. R. Hudson.


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*CONCLUSIONS
*LITERATURE CITED

Summaries of diversity and of the frequency spectrum of mutations are positively correlated with rates of crossing over:
In Table 2, we summarize patterns of nucleotide variation at synonymous sites and noncoding DNA for 10 newly sequenced loci and 19 previously published data sets. To allow for comparisons between X and autosome diversity levels, a point of departure is to multiply X-linked diversities by 4/3 (e.g., BEGUN and AQUADRO 1992 Down); this approach assumes equal sex ratios and no sexual selection. In Fig 1, we plot the relationship between the corrected {pi}, the average number of pairwise differences per site, and the estimated rates of crossing over per physical distance. As expected (BEGUN and AQUADRO 1992 Down; AQUADRO et al. 1994 Down), nucleotide diversity is positively correlated with crossing over rates (Pearson's R = 0.61, P = 0.001; Spearman's R = 0.66, P < 0.001; all reported P values are two-tailed). It is unclear, however, whether this scaling is appropriate. Sexual selection in natural populations can cause the scaling of X to autosome diversities to be closer to unity (CABALLERO 1995 Down). If diversities in Table 2 are left uncorrected, the correlation is stronger (Pearson's R = 0.65, P < 0.001; Spearman's R = 0.71, P < 0.001).



View larger version (12K):
In this window
In a new window
Download PPT slide
 
Figure 1. Relationship between silent nucleotide diversity ({pi}%) and the rate of crossing over for 29 loci sampled from African populations of D. melanogaster. Shaded circles represent X-linked loci. X-linked diversities (Table 2) have been multiplied by 4/3 to make them comparable to autosomal diversities, assuming equal sex ratios and no sexual selection. There is a significant, positive correlation (Pearson's R = 0.61, P = 0.001, Spearman's P < 0.001, two-tailed).


 
View this table:
In this window
In a new window

 
Table 2. Patterns of variability and local rates of crossing over for 29 loci in African populations of D. melanogaster

In Fig 2, we plot the values of Tajima's D and Fu and Li's D* against rates of crossing over for the 29 loci in Table 2. Values of D are sharply negative in areas of low crossing over and increase with increasing rates of crossing over (Pearson's R = 0.56, P = 0.002; Spearman's R = 0.54, P = 0.002). The correlation between crossing over and D* is even less likely to occur by chance (Pearson's R = 0.67, P < 0.001; Spearman's R = 0.64, P < 0.001). These patterns are expected under a model of hitchhiking associated with the rapid fixation of advantageous mutations (BRAVERMAN et al. 1995 Down; GILLESPIE 2000 Down) and under some random-environment-selection models (GILLESPIE 1997 Down), but not under the background selection model (HUDSON and KAPLAN 1994 Down; CHARLESWORTH et al. 1995 Down). A significant correlation is also obtained when the analysis is restricted to the 22 loci sampled only in Zimbabwe (for D, Pearson's R = 0.53, P = 0.011; Spearman's R = 0.51, P = 0.016; while for D*, Pearson's R = 0.66, P = 0.001; Spearman's R = 0.58, P = 0.004).



View larger version (14K):
In this window
In a new window
Download PPT slide
 
Figure 2. Scatterplot of (a) Tajima's D and (b) Fu and Li's D* vs. the rate of crossing over for 29 loci sampled from African populations of D. melanogaster. Shaded circles represent X-linked loci. D vs. crossing over: Pearson's R = 0.56, P = 0.002, Spearman's P = 0.002, two-tailed. D* vs. crossing over: Pearson's R = 0.67, P < 0.001, Spearman's P < 0.001, two-tailed. The correlation with D is improved by excluding insertion-deletion variation (Pearson's R = 0.59).

One concern is that loci in areas of very low crossing over may not represent independent data points. However, considering these loci as a single data point has no effect on our conclusions. As an illustration, if we replace the D values for the nine X-linked loci with a crossing over rate of <=5 x 10-6/kb/generation by one value (either the average D or the total D), the correlation is still highly significant (P < 0.01).

Also relevant is the presence of inversion polymorphisms in African populations of D. melanogaster (LEMEUNIER and AULARD 1992 Down). When heterozygous, inversions suppress recombination locally and can increase crossing over rates on other chromosomes (SCHULTZ and REDFIELD 1951 Down; SNIEGOWSKI et al. 1994 Down). These features make our estimates of the crossing over rate less certain. In particular, because inversion polymorphisms are more common on autosomes than on the X chromosome, estimates of crossing over rates for autosomes could be systematically biased upward relative to those on the X. However, the correlation of D (or D*) and crossing over rates is still significant if autosomal crossing over rates are reduced by one-half (for D, P < 0.02; and for D*, P < 0.002). The main effect of inverted chromosomes should be to introduce additional noise (ANDOLFATTO et al. 2001 Down). In fact, the correlation is improved if inverted chromosomes are excluded where possible [In(2L)t, Fbp2, Su(H), and Vha, results not shown]. In this light, it is worth noting that the correlation between diversity and crossing over is stronger for X-linked loci (Pearson's R = 0.90, 15 loci), where there are no high frequency inversions, than for autosomal loci (Pearson's R = 0.39, 14 loci).

Sensitivity to demographic assumptions:
Our null model assumes a random-mating population of constant size. Population structure will alter the distribution of D relative to panmixia, even if samples are drawn from a single deme (see WALL 1999 Down). When samples are drawn from one island in a symmetric island model, the mean D is slightly increased, and there is more weight on both tails of the distribution of D (WALL 1999 Down; M. PRZEWORSKI, unpublished results). To test the significance of the correlation between D and r in the presence of structure, we ran simulations of a symmetric two-island model for 4Npopm = 1, where Npop is the number of individuals in each population and m is the fraction of migrants per generation. Sampling was entirely from one deme. We ran 104 trials to estimate the distribution R under this null model (see MATERIALS AND METHODS). Only 9 of the 104 trials had R >= 0.56 (the observed value). This model of population subdivision is a highly simplified one; nonetheless, the results suggest that our correlation is no less unusual in the presence of population structure.

Inequality of variance among samples and the background selection hypothesis:
Standard correlation tests are not entirely appropriate, since under the null model the expected variance in D values increases with decreasing recombination (HUDSON 1983 Down; WALL 1999 Down; PRZEWORSKI et al. 2001 Down). In addition, samples from regions of low recombination tend to contain fewer segregating sites (see Table 2). With this in mind, we performed a weighted regression, weighting each observed D value by the reciprocal of the estimated variance of D for each locus (see MATERIALS AND METHODS). When the inequality of variances among samples is taken into account in this way, P = 0.01 (R = 0.47).

Background selection against strongly deleterious alleles in large populations can be thought of as a simple reduction in effective population size (HUDSON and KAPLAN 1994 Down, 1995), so should not result in a distortion of the frequency spectrum. However, by reducing C, background selection will further increase the variance of D. We therefore performed a second weighted regression. This time, the variance of D was estimated from simulations that condition on C = f4Nr (or f3Nr for X-linked loci), where f for each locus is the expected local reduction in effective population size due to background selection, as estimated by CHARLESWORTH 1996 Down. Under this model, the probability of observing such a strong positive correlation between D and r (R = 0.48) by chance is estimated to be P < 0.01. Thus, background selection alone is an unlikely explanation for the correlation.

Effects of varying sample size:
We have shown that if we assume a random-mating population of constant size [where E(D) {cong} 0], the correlation between D and crossing over rates is highly unlikely. Next, we consider whether the correlation between D and r might reflect negative D values at all loci. This is a concern because when the population D is not zero, the sample estimate of D is no longer unbiased. In particular, small samples will yield estimates closer to zero than will large ones (cf. GILLESPIE 2000 Down). In our data, there is a marginally significant negative correlation between sample size (n) and crossing over rates (Pearson's R = -0.344, two-tailed P = 0.07). Suppose that population D values are all nonzero, for example, due to population expansion. Might the unequal sample sizes result in a correlation between diversity and crossing over rates? To test this possibility, we randomly subsampled 10 alleles for each of the two data sets with n = 50. For this version of the data, D values are -1.35 for su(s) and 0.55 for su(wa) and all sample sizes are <20. The correlation between n and estimated crossing over rates has all but disappeared (Pearson's R = -0.13, two-tailed P = 0.51), yet there is still a significant correlation between D and crossing over rates (Pearson's R = 0.48, two-tailed P < 0.01). Thus, the correlation between D and r observed in these samples is likely to reflect an underlying correlation in population D values (rather than nonzero population D values at all loci).

Whether we even expect the negative correlation between n and r to produce a spurious correlation between D and r is unclear. While loci in areas of high crossing over tend to have smaller sample sizes, they also tend to have many more polymorphic sites. This follows from the strong correlation observed between diversity and rates of crossing over in our data (Fig 1). Also, the variance of D under the null model is much smaller for loci in regions of high recombination than it is for loci in regions of low recombination (PRZEWORSKI et al. 2001 Down). To explore this further, we investigated whether a positive correlation is expected for a model of population growth. Specifically, we estimated the distribution of Pearson's R for the 29 loci for a model of constant population size (N = 105) followed by 105 generations of exponential growth to a present size of 107. This model is not intended as a realistic depiction of the population history of D. melanogaster but as an illustration. With these parameters, for n = 10 and 10 segregating sites, the mean D is ~ -0.80, while for n = 50 and 50 segregating sites, the mean D is ~ -1.76 (as estimated from 10,000 simulations). The average R value in 104 simulations of the 29 loci was +0.08 (i.e., significantly greater than 0). However, the Pr(R >= 0.56) was estimated to be 0.001; in other words, the correlation that we observe is still highly unlikely. While this is only one model among many, it suggests that the effect of varying sample sizes on R is quite weak when other facets of the data are also taken into account.


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*CONCLUSIONS
*LITERATURE CITED

Models of positive selection:
The positive correlation between D and crossing over rates is unlikely under a simple model of background selection but is expected under simple recurrent hitchhiking models (BRAVERMAN et al. 1995 Down; GILLESPIE 2000 Down). As one illustration, the expected D value in GILLESPIE's (2000) pseudo-hitchhiking model is ~ -1.5 for regions of no recombination and samples of 20 drawn from "large" populations (N >= 104). Thus, hitchhiking due to the rapid fixation of advantageous variants appears to be the best explanation for the data. Patterns of polymorphism may also be consistent with random-environment-selection models (GILLESPIE 1997 Down), but their sampling properties have yet to be investigated explicitly.

Also of interest are models that incorporate both deleterious and favorable mutations, as both processes are likely to be occurring simultaneously (HUDSON 1994 Down). Such models are at their infancy. The sole study of the joint effects of hitchhiking and background (KIM and STEPHAN 2000 Down) concludes that hitchhiking is likely to be the main force shaping patterns of variation in regions of low recombination, while background selection may dominate when recombination rates are higher. Thus, the correlation between D and r reported here may be consistent with the joint action of these forces.

Alternative deleterious evolution models:
Next, we consider whether alternative models of deleterious evolution might also be consistent with the patterns reported in Fig 1 and Fig 2. A large class of deleterious mutations may have selection coefficients smaller than values typically used in models of background selection (FRY et al. 1999 Down; KEIGHTLEY and EYRE-WALKER 1999 Down). Such selection intensities may be particularly relevant to transposable elements (cf. CHARLESWORTH 1996 Down) as well as to a subset of synonymous sites and amino acid variants (AKASHI 1999 Down). While selection against strongly deleterious mutations does not result in a skewed frequency spectrum at linked neutral sites (HUDSON and KAPLAN 1994 Down), selection against weakly deleterious mutations can lead to an excess of rare neutral alleles in large samples (e.g., for a sample size of 100 chromosomes, CHARLESWORTH et al. 1995 Down). Whether a skew would also be detectable in small samples (our median sample size is 13) is unknown. In addition, the parameters (selection coefficient and deleterious mutation rate) required to produce a skew in the frequency spectrum of linked neutral variants are unlikely to cause the drastic reduction of nucleotide diversity observed in regions of low recombination in D. melanogaster (CHARLESWORTH et al. 1995 Down; CHARLESWORTH 1996 Down; NORDBORG et al. 1996 Down).

When purifying selection is weak, linked deleterious alleles may persist long enough in a population to cosegregate. Selection at one site can then reduce the efficacy of selection at linked sites (HILL and ROBERTSON 1966 Down; FELSENSTEIN 1974 Down), a phenomenon termed Hill-Robertson interference (HRI). Weak selection HRI models (e.g., LI 1987 Down; COMERON et al. 1999 Down; MCVEAN and CHARLESWORTH 2000 Down; TACHIDA 2000 Down) can produce a negative skew in the frequency spectrum of linked neutral variants under some parameters (TACHIDA 2000 Down). However, while HRI models produce a reduction in diversity levels, they cannot explain diversity reductions on the scale of those observed in areas of low (but non-zero) crossing over in D. melanogaster (TACHIDA 2000 Down; G. MCVEAN, personal communication).

In addition, the effect of weak selection HRI is probably weaker than has been estimated in simulations, as published models (e.g., MCVEAN and CHARLESWORTH 2000 Down) include crossing over but not gene conversion. Gene conversion may be the prevalent mechanism determining the strength of associations among segregating mutations in regions of very low crossing over in Drosophila (LANGLEY et al. 2000 Down), and thus the strength of HRI. Preliminary simulations suggest that the effect of HRI on the frequency spectrum and diversity levels at linked neutral variants is considerably weakened by the presence of gene conversion in regions of nonzero crossing over (J. COMERON, personal communication; G. MCVEAN, personal communication).

The models described thus far have considered the effect of selection on linked neutral variation. However, sampled variation may not be neutral, but may itself be under very weak selection (i.e., on the order of 1/N). In this case, a negative D value is expected for all rates of recombination (MCVEAN and CHARLESWORTH 2000 Down). In addition, only a minor reduction in variability is expected for regions of low recombination relative to regions of high recombination (e.g., see Figure 3 of MCVEAN and CHARLESWORTH 2000 Down).

In summary, existing models of deleterious evolution, weak or strong, are unlikely to account for both the reductions in diversity and the skew in D observed in small samples from regions of low crossing over. However, models with a distribution of selection coefficients have yet to be investigated. Recent work on the selective effects of deleterious mutations suggests a large class of deleterious mutations with minor effects in both Drosophila and Caenorhabditis elegans (DAVIES et al. 1999 Down; KEIGHTLEY and EYRE-WALKER 1999 Down). The challenge is to construct a more realistic model of deleterious evolution (including a distribution of selection coefficients, crossing over, and gene conversion) that is constrained by plausible estimates of the deleterious mutation rate (MCVEAN and VIEIRA 2001 Down), yet can account for the pattern reported here.

The potential importance of demographic factors:
In this study, we focused on African samples of D. melanogaster. We did so because non-African populations are likely to have experienced a more complicated demographic history (DAVID and CAPY 1988 Down; LACHAISE et al. 1988 Down). In this light, it is noteworthy that no significant correlation has been found between D and crossing over rates in the available data for non-African populations (cf. AQUADRO et al. 1994 Down; CHARLESWORTH et al. 1995 Down). The reasons for the discrepancy between African and non-African populations are not understood. One possibility is that a recent demographic perturbation (e.g., founder effects) in the history of non-African populations (DAVID and CAPY 1988 Down; BEGUN and AQUADRO 1993 Down, 1995) obscures the pattern of natural selection. Long-term population structure (e.g., SLATKIN and WIEHE 1998 Down; HAMBLIN and DI RIENZO 2000 Down) could also alter the signature of selection relative to the predictions of existing models.

Species-specific demographic considerations are likely to be of relevance for evolutionary inferences from other species as well. A large quantity of polymorphism data will soon be available for humans. Like Drosophila, anatomically modern humans may have an African origin and have only recently become cosmopolitan (CANN et al. 1987 Down). Preliminary data suggest a correlation between diversity levels and crossing over rates (NACHMAN et al. 1998 Down; PRZEWORSKI et al. 2000 Down). By carefully considering the demographic history of populations, it may be possible to recover a similar pattern in the human data.


*  CONCLUSIONS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*CONCLUSIONS
*LITERATURE CITED

A decade of empirical work has been devoted to distinguishing between a simple model of background selection (CHARLESWORTH et al. 1993 Down; HUDSON and KAPLAN 1995 Down) and a simple model of positive selection (MAYNARD SMITH and HAIGH 1974 Down; KAPLAN et al. 1989 Down). Here we have shown that both levels of diversity and a summary of the frequency spectrum of mutations, D, are positively correlated with rates of crossing over in African populations of D. melanogaster. This multilocus pattern is unlikely under the first model and is expected under the second. Together with an increasing number of examples of selective sweeps at individual loci in Drosophila and other organisms (e.g., SCHLOTTERER et al. 1997 Down; CHEN et al. 2000 Down; HAMBLIN and DI RIENZO 2000 Down; NACHMAN and CROWELL 2000 Down; YI and CHARLESWORTH 2000 Down), our results suggest that positive Darwinian selection is a common force shaping patterns of variability in natural populations.


*  ACKNOWLEDGMENTS

We thank J. Comeron, R. Hudson, T. Johnson, M. S. McPeek, and M. Stephens for helpful discussions and B. Charlesworth, D. Charlesworth, P. Donnelly, S. Otto, J. Pritchard, and J. Wall for comments on the manuscript. R. Hudson pointed out that standard correlation tests might not be appropriate in this context. F. Depaulis, C. Langley, and S.-C. Tsaur kindly shared data prior to its publication. P.A. is supported by a European Molecular Biology Organization postdoctoral fellowship. M.P. is supported by a National Science Foundation postdoctoral fellowship in Bioinformatics.

Manuscript received November 7, 2000; Accepted for publication February 3, 2001.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*CONCLUSIONS
*LITERATURE CITED

AGUADÉ, M., N. MIYASHITA, and C. H. LANGLEY, 1989  Reduced variation in the yellow-achaete-scute region in natural populations of Drosophila melanogaster. Genetics 122:607-615[Abstract/Free Full Text].

AKASHI, H., 1999  Within- and between-species DNA sequence variation and the ‘footprint’ of natural selection. Gene 238:39-51[Medline].

ANDOLFATTO, P., 2001  Contrasting patterns of X-linked and autosomal nucleotide variation in Drosophila melanogaster and D. simulans. Mol. Biol. Evol. 18:279-290[Abstract/Free Full Text].

ANDOLFATTO, P. and M. PRZEWORSKI, 2000  A genome-wide departure from the standard neutral model in natural populations of Drosophila. Genetics 156:257-268[Abstract/Free Full Text].

ANDOLFATTO, P., F. DEPAULIS, and A. NAVARRO, 2001  Inversion polymorphism and nucleotide variability in Drosophila.. Genet. Res. 77:1-8[Medline].

AQUADRO, C. F., D. J. BEGUN and E. C. KINDAHL, 1994 Selection, recombination, and DNA polymorphism in Drosophila, pp. 46–56 in Non-Neutral Evolution: Theories and Molecular Data, edited by B. GOLDING. Chapman and Hall, New York.

ASHBURNER, M., 1989 Drosophila: A Laboratory Handbook. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York.

BEGUN, D. J. and C. F. AQUADRO, 1992  Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster.. Nature 356:519-520[Medline].

BEGUN, D. J. and C. F. AQUADRO, 1993  African and North American populations of Drosophila melanogaster are very different at the DNA level. Nature 365:548-550[Medline].

BEGUN, D. J. and C. F. AQUADRO, 1995  Molecular variation at the vermilion locus in geographically diverse populations of Drosophila melanogaster and D. simulans.. Genetics 140:1019-1032[Abstract].

BERRY, A. J., J. W. AJIOKA, and M. KREITMAN, 1991  Lack of polymorphism on the Drosophila fourth chromosome resulting from selection. Genetics 129:1111-1117[Abstract].

BRAVERMAN, J. M., R. R. HUDSON, N. L. KAPLAN, C. H. LANGLEY, and W. STEPHAN, 1995  The hitchhiking effect on the site frequency spectrum of DNA polymorphisms. Genetics 140:783-796[Abstract].

CABALLERO, A., 1995  On the effective size of populations with separate sexes, with particular reference to sex-linked genes. Genetics 139:1007-1011[Abstract].

CANN, R. L., M. STONEKING, and A. C. WILSON, 1987  Mitochondrial DNA and human evolution. Nature 325:31-36.

CHARLESWORTH, B., 1996  Background selection and patterns of genetic diversity in Drosophila melanogaster.. Genet. Res. 68:131-149[Medline].

CHARLESWORTH, B., M. T. MORGAN, and D. CHARLESWORTH, 1993  The effect of deleterious mutations on neutral molecular variation. Genetics 134:1289-1303[Abstract].

CHARLESWORTH, D., B. CHARLESWORTH, and M. T. MORGAN, 1995  The pattern of neutral molecular variation under the background selection model. Genetics 141:1619-1632[Abstract].

CHEN, Y., B. J. MARSH, and W. STEPHAN, 2000  Joint effects of natural selection and recombination on gene flow between Drosophila ananassae populations. Genetics 155:1185-1194[Abstract/Free Full Text].

CLARK, A. G. and L. WANG, 1997  Molecular population genetics of Drosophila immune system genes. Genetics 147:713-724[Abstract].

COMERON, J. M., M. KREITMAN, and M. AGUADE, 1999  Natural selection on synonymous sites is correlated with gene length and recombination in Drosophila. Genetics 151:239-249[Abstract/Free Full Text].

DAVID, J. R. and P. CAPY, 1988  Genetic variation of Drosophila melanogaster natural populations. Trends Genet. 4:106-111[Medline].

DAVIES, E. K., A. D. PETERS, and P. D. KEIGHTLEY, 1999  High frequency of cryptic deleterious mutations in Caenorhabditis elegans.. Science 285:1748-1751[Abstract/Free Full Text].

DEPAULIS, F., L. BRAZIER, S. MOUSSET, A. TURBE, and M. VEUILLE, 2000  Selective sweep near the In(2L)t inversion breakpoint in an African population of Drosophila melanogaster. Genet. Res. 76:149-158[Medline].

FELSENSTEIN, J., 1974  The evolutionary advantage of recombination. Genetics 78:737-756[Abstract/Free Full Text].

FRY, J. D., P. D. KEIGHTLEY, S. L. HEINSOHN, and S. V. NUZHDIN, 1999  New estimates of the rates and effects of mildly deleterious mutation in Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 96:574-579[Abstract/Free Full Text].

FU, Y. X. and W. H. LI, 1993  Statistical tests of neutrality of mutations. Genetics 133:693-709[Abstract].

GASPERINI, R. and G. GIBSON, 1999  Absence of protein polymorphism in the Ras genes of Drosphila melanogaster. J. Mol. Evol. 49:583-590[Medline].

GILLESPIE, J. H., 1997  Junk ain't what junk does: neutral alleles in a selected context. Gene 205:291-299[Medline].

GILLESPIE, J. H., 2000  Genetic drift in an infinite population. The pseudohitchhiking model. Genetics 155:909-919[Abstract/Free Full Text].

HALE, L. R. and R. S. SINGH, 1991  A comprehensive study of genic variation in natural populations of Drosophila melanogaster. IV. Mitochondrial DNA variation and the role of history vs. selection in the genetic structure of geographic populations. Genetics 129:103-117[Abstract].

HAMBLIN, M. T. and C. F. AQUADRO, 1997  Contrasting patterns of nucleotide sequence variation at the Glucose dehydrogenase (Gld) locus in different populations of Drosophila melanogaster. Genetics 145:1053-1062[Abstract].

HAMBLIN, M. T. and A. DI RIENZO, 2000  Detection of the signature of natural selection in humans: evidence from the Duffy blood group locus. Am. J. Hum. Genet. 66:1669-1679[Medline].

HEINO, T. I., A. O. SAURA, and V. SORSA, 1994  Maps of the salivary gland chromosomes of Drosophila melanogaster. Dros. Inf. Serv. 73:621-738.

HILL, W. G. and A. ROBERTSON, 1966  The effect of linkage on limits to artificial selection. Genet. Res. 8:269-294[Medline].

HUDSON, R. R., 1983  Properties of a neutral allele model with intragenic recombination. Theor. Popul. Biol. 23:183-201[Medline].

HUDSON, R. R., 1990 Gene genealogies and the coalescent process, pp. 1–44 in Oxford Surveys in Evolutionary Biology, Vol. 7, edited by D. FUTUYMA and J. ANTONOVICS. Oxford University Press, Oxford.

HUDSON, R. R., 1993 The how and why of generating gene genealogies, pp. 23–36 in Mechanisms of Molecular Evolution, edited by N. TAKAHATA and A. G. CLARK. Japan Scientific Society, Tokyo.

HUDSON, R. R., 1994  How can the low levels of DNA sequence variation in regions of the Drosophila genome with low recombination rates be explained? Proc. Natl. Acad. Sci. USA 91:6815-6818[Abstract/Free Full Text].

HUDSON, R. R., and N. L. KAPLAN, 1994 Gene trees with background selection, pp. 140–153 in Non-Neutral Evolution: Theories and Molecular Data, edited by B. GOLDING. Chapman & Hall, London.

HUDSON, R. R. and N. L. KAPLAN, 1995  The coalescent process and background selection. Philos. Trans. R. Soc. Lond. Ser. B 349:19-23[Medline].

IRVIN, S. D., K. A. WETTERSTRAND, C. M. HUTTER, and C. F. AQUADRO, 1998  Genetic variation and differentiation at microsatellite loci in Drosophila simulans. Evidence for founder effects in new world populations. Genetics 150:777-790[Abstract/Free Full Text].

KAPLAN, N. L., R. R. HUDSON, and C. H. LANGLEY, 1989  The "hitchhiking effect" revisited. Genetics 123:887-899[Abstract/Free Full Text].

KEIGHTLEY, P. D. and A. EYRE-WALKER, 1999  Terumi Mukai and the riddle of deleterious mutation rates. Genetics 153:515-523[Abstract/Free Full Text].

KIM, Y. and W. STEPHAN, 2000  Joint effects of genetic hitchhiking and background selection on neutral variation. Genetics 155:1415-1427[Abstract/Free Full Text].

LACHAISE, D., L. M. CARIOU, J. R. DAVID, F. LEMEUNIER, and L. TSACAS et al., 1988  Historical biogeography of the Drosophila melanogaster species subgroup. Evol. Biol. 22:159-225.

LANGLEY, C. H., B. P. LAZZARO, W. PHILIPS, E. HEIKINEN, and J. BRAVERMAN, 2000  Linkage disequilibria and the site frequency spectra in the su(s) and su(wa) regions of the Drosophila melanogaster X chromosome. Genetics 156:1837-1852[Abstract/Free Full Text].

LEMEUNIER, F., and S. AULARD, 1992 Inversion polymorphism in Drosophila melanogaster, pp. 339–405 in Drosophila Inversion Polymorphism, edited by C. B. KRIMBAS and J. R. POWELL. CRC Press, Boca Raton, FL.

LI, W. H., 1987  Models of nearly neutral mutations with particular implications for nonrandom usage of synonymous codons. J. Mol. Evol. 24:337-345[Medline].

LI, Y. J., Y. SATTA, and N. TAKAHATA, 1999  Paleo-demography of the Drosophila melanogaster subgroup: application of the maximum likelihood method. Genes Genet. Syst. 74:117-127[Medline].

MARJORAM, P. and P. DONNELLY, 1994  Pairwise comparisons of mitochondrial DNA sequences in subdivided populations and implications for early human evolution. Genetics 136:673-683[Abstract].

MAYNARD SMITH, J. and J. HAIGH, 1974  The hitch-hiking effect of a favourable gene. Genet. Res. 23:23-35[Medline].

MCVEAN, G. A. T. and B. CHARLESWORTH, 2000  The effects of Hill-Robertson interference between weakly selected mutations on patterns of molecular evolution and variation. Genetics 155:929-944[Abstract/Free Full Text].

MCVEAN, G. A. T. and J. VIEIRA, 2001  Inferring parameters of mutation, selection and demography from patterns of synonymous site evolution in Drosophila. Genetics 157:245-257[Abstract/Free Full Text].

NACHMAN, M. W., 1997  Patterns of DNA variability at X-linked loci in Mus domesticus. Genetics 147:1303-1316[Abstract].

NACHMAN, M. W. and S. L. CROWELL, 2000  Contrasting evolutionary histories of two introns of the duchenne muscular dystrophy gene, Dmd, in humans. Genetics 155:1855-1864[Abstract/Free Full Text].

NACHMAN, M. W., V. L. BAUER, S. L. CROWELL, and C. F. AQUADRO, 1998  DNA variability and recombination rates at X-linked loci in humans. Genetics 150:1133-1141[Abstract/Free Full Text].

NORDBORG, M., 1997  Structured coalescent processes on different time scales. Genetics 146:1501-1514[Abstract].

NORDBORG, M., B. CHARLESWORTH, and D. CHARLESWORTH, 1996  The effect of recombination on background selection. Genet. Res. 67:159-174[Medline].

PRZEWORSKI, M., R. R. HUDSON, and A. DI RIENZO, 2000  Adjusting the focus on human variation. Trends Genet. 16:296-302[Medline].

PRZEWORSKI, M., J. D. WALL, and P. ANDOLFATTO, 2001  Recombination and the frequency spectrum in Drosophila melanogaster and D. simulans. Mol. Biol. Evol. 18:291-298[Abstract/Free Full Text].

ROZAS, J. and R. ROZAS, 1999  DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15:174-175[Abstract/Free Full Text].

SCHLÖTTERER, C., C. VOGL, and D. TAUTZ, 1997  Polymorphism and locus-specific effects on polymorphism at microsatellite loci in natural Drosophila melanogaster populations. Genetics 146:309-320[Abstract].

SCHULTZ, J. and H. REDFIELD, 1951  Interchromosomal effects on crossing-over in Drosophila. Cold Spring Harbor Symp. Quant. Biol. 16:175-197.

SLATKIN, M. and T. WIEHE, 1998  Genetic hitch-hiking in a subdivided population. Genet. Res. 71:155-160[Medline].

SNIEGOWSKI, P. D., A. PRINGLE, and K. A. HUGHES, 1994  Effects of autosomal inversions on meiotic exchange in distal and proximal regions of the X-chromosome in a natural population of Drosophila-melanogaster. Genet. Res. 63:57-62[Medline].

STEPHAN, W., 1994 Effects of genetic recombination and population subdivision on nucleotide sequence variation in Drosophila annanasae, pp. 57–66 in Non-Neutral Evolution: Theories and Molecular Data, edited by B. GOLDING. Chapman & Hall, London.

STEPHAN, W. and C. H. LANGLEY, 1998  DNA polymorphism in Lycopersicon and crossing-over per physical length. Genetics 150:1585-1593[Abstract/Free Full Text].

TACHIDA, H., 2000  Molecular evolution in a multisite nearly neutral mutation model. J. Mol. Evol. 50:69-81[Medline].

TAJIMA, F., 1989a  DNA polymorphism in a subdivided population: the expected number of segregating sites in the two-subpopulation model. Genetics 123:229-240[Abstract/Free Full Text].

TAJIMA, F., 1989b  The effect of change in population size on DNA polymorphism. Genetics 123:597-601[Abstract/Free Full Text].

TAJIMA, F., 1989c  Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585-595[Abstract/Free Full Text].

TAKANO-SHIMIZU, T., 1999  Local recombination and mutation effects on molecular evolution in Drosophila. Genetics 153:1285-1296[Abstract/Free Full Text].

WALL, J. D., 1999  Recombination and the power of statistical tests of neutrality. Genet. Res. 73:65-79.

WATTERSON, G. A., 1975  On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7:256-276[Medline].

YI, S. and B. CHARLESWORTH, 2000  A selective sweep associated with a recent gene transposition in Drosophila miranda. Genetics 156:1753-1763[Abstract/Free Full Text].




This article has been cited by other articles:


Home page
Genome ResHome page
I. Hellmann, Y. Mang, Z. Gu, P. Li, F. M. de la Vega, A. G. Clark, and R. Nielsen
Population genetic analysis of shotgun assemblies of genomic sequences from multiple individuals
Genome Res., July 1, 2008; 18(7): 1020 - 1029.
[Abstract] [Full Text] [PDF]


Home page