The relationship between rates of recombination and DNA sequence polymorphism was analyzed for the second chromosome of Drosophila pseudoobscura. We constructed integrated genetic and physical maps of this chromosome using molecular markers at 10 loci spanning most of its physical length. The total length of the map was 128.2 cM, almost twice that of the homologous chromosome arm (3R) in D. melanogaster. There appears to be very little centromeric suppression of recombination, and rates of recombination are quite uniform across most of the chromosome. Levels of sequence variation (θW, based on the number of segregating sites) at seven loci (tropomyosin 1, Rhodopsin 3, Rhodopsin 1, bicoid, Xanthine dehydrogenase, Myosin light chain 1, and ribosomal protein 49) varied from 0.0036 to 0.0167. Generally consistent with earlier studies, the average estimate of θW at total sites is 1.5-fold higher than that in D. melanogaster, while average θW at silent sites is almost 3-fold higher. These estimates of variation were analyzed in the context of a background selection model under the same parameters of mutation rate and selection as have been proposed for D. melanogaster. It is likely that a significant fraction of the higher level of sequence variation in D. pseudoobscura can be explained by differences in regional rates of recombination rather than a larger species-level effective population size. However, the distribution of variation among synonymous, nonsynonymous, and noncoding sites appears to be quite different between the species, making direct comparisons of neutral variation, and hence inferences about effective population size, difficult. Tajima’s D statistics for 6 out of the 7 loci surveyed are negative, suggesting that D. pseudoobscura may have experienced a rapid population expansion in the recent past or, alternatively, that slightly deleterious mutations constitute an important component of standing variation in this species.
BOTH Drosophila melanogaster and D. pseudoobscura are important model species for population genetic and evolutionary studies. While they are fairly closely related (their estimated divergence time is 30 million years), their evolutionary history and ecology are apparently quite different. D. melanogaster originated in the tropics, became commensal with humans, and has spread worldwide in the recent past (Lachaiseet al. 1988). D. pseudoobscura, in contrast, originated in North America, where it lives largely apart from people in forested habitats (Dobzhansky and Epling 1944). Although D. pseudoobscura’s primary range extends down into Guatemala, it is a temperate, not tropical, species. Furthermore, D. pseudoobscura, unlike D. melanogaster, is not cosmopolitan. Its range appears to be limited, for reasons that are not understood, to the western half of North and Central America. Within that range, however, D. pseudoobscura has relatively high rates of dispersal and little population structure compared with D. melanogaster, which is relatively sedentary (Powell 1997). While our knowledge of the history of these species is limited, what we do know suggests that temperate populations of D. melanogaster may have experienced significant amounts of adaptive evolution in the recent past, while the environment of D. pseudoobscura may have been more stable (albeit shifting in size and location during periods of glaciation).
Levels of DNA sequence variation in these two species suggest that D. melanogaster has a smaller effective population size (Ne) than D. pseudoobscura. Schaeffer et al. (1987) found ∼4-fold more restriction-site variation in D. pseudoobscura than in D. melanogaster in a 13-kb region around the alcohol dehydrogenase (Adh) locus, while sequencing of Adh and Adh-dup in a population sample of D. pseudoobscura (Schaeffer and Miller 1992a) revealed ∼2.2-fold more variation than in a world-wide sample of D. melanogaster alleles (Kreitman and Hudson 1991). Restriction fragment length polymorphism (RFLP) studies of the Xdh locus suggested a 3-fold larger effective population size for D. pseudoobscura (Rileyet al. 1989). Extensive surveys of DNA sequence polymorphism have been made only in D. melanogaster, however. Furthermore, the relationship between rates of recombination and polymorphism, which has been shown in D. melanogaster to be very important (e.g., Begun and Aquadro 1992; Aquadroet al. 1994), has not been explored at all in D. pseudoobscura. Such an exploration is of great interest, as it might allow the disentanglement of some of the factors that contribute to the differences in levels of DNA sequence variation both within and between species. For example, the genetic map of D. pseudoobscura is known to be considerably longer than that of D. melanogaster, though their genomes are the same size (Powell 1997). To what extent does this larger genetic map, rather than larger species-level effective population size, account for the severalfold difference in levels of molecular variation?
Another interesting issue concerns the relative contributions of adaptive vs. purifying selection to the observed relationship between recombination and variation in D. melanogaster. That relationship is driven by the local reduction of effective population size in regions linked to targets of both positive and negative selection (Hill and Robertson 1966; Kaplanet al. 1989; Charlesworthet al. 1993; Wiehe and Stephan 1993; Hudson and Kaplan 1995; Charlesworth 1996). It has been argued that the rate of deleterious mutations has been overestimated (Keightley 1996; Fryet al. 1999), raising the possibility that the polymorphism in D. melanogaster has been shaped by significant amounts of positive selection resulting from adaptation to new environments. Given the different history of D. pseudoobscura, we might expect the contribution of positive selection to the relationship between recombination and variation in this species to be less important.
One of the obstacles to this type of analysis is the requirement for integrated physical and genetic maps such as are currently available for D. melanogaster. In the case of D. pseudoobscura, however, this obstacle is not huge. The presence of polytene chromosomes allows physical localization, by in situ hybridization, of any cloned region or PCR product. Sequence conservation of coding regions between D. melanogaster and D. pseudoobscura facilitates PCR amplification of homologous loci. Conservation of the five major linkage groups (elements A-E; Sturtevant and Novitski 1941) allows prediction of the chromosomal location of any gene that has been localized in D. melanogaster. We have taken advantage of these attributes to develop integrated genetic and physical maps of the second chromosome of D. pseudoobscura. This chromosome is homologous to chromosome 3R in D. melanogaster, where many genes have been surveyed, and which was used by Hudson and Kaplan (1995) to test the background selection hypothesis. We have also collected DNA sequence polymorphism data for seven loci across the chromosome. These data have been analyzed to determine the nature of the relationship between variation and recombination in D. pseudoobscura, and to ask whether this relationship can be explained by similar parameters (intensity and frequency of selective events, species-level effective population size) as for D. melanogaster.
MATERIALS AND METHODS
Fly stocks: The population sample used for this study was obtained from a collection of isofemale lines of D. pseudoobscura from Goldendale, Washington established by M. Noor in summer of 1996. These lines were inbred by full-sib mating to facilitate sequencing and genetic analysis. Twenty-two lines were successfully inbred for 11-15 generations. The D. miranda line, SP235 from Spray, Oregon, was obtained from W. Anderson.
DNA preparation: DNAs for mapping were prepared from single flies arrayed in 96-well plates (Glooret al. 1993). DNAs for sequencing were prepared from groups of 10 flies from the same inbred line using the method of Ashburner (1989).
Construction of a genetic and physical map of the second chromosome: GenBank sequences of D. pseudoobscura genes that are homologous to genes on 3R of D. melanogaster were identified. These sequences were examined for the presence of microsatellites or other repeated sequences that might provide highly polymorphic, easily scored markers. Such sequences were found in or near four genes: Glucose dehydrogenase (Gld), Rhodopsin 1 (Rh1), Myosin light chain 1 (Mlc1), and bicoid (bcd). PCR primers were designed to amplify small products containing these repeats. In gene regions where no repeated sequences were found, a survey of sequence variation at the locus was used to identify regions that were likely to show multiple alleles by single-strand conformation polymorphism (SSCP) analysis (Oritaet al. 1989). This approach was successful in identifying markers in four additional genes: Rhodopsin 3 (Rh3), ultrabithorax (Ubx), Xanthine dehydrogenase (Xdh), and ribosomal protein 49 (rp49).
In the case of tropomyosin 1 (trop1), no sequence data were previously available from D. pseudoobscura, but the physical location was known (B. Charlesworth, personal communication). The D. pseudoobscura sequence of all of intron C and 713 nucleotides of intron D of the trop1 gene was obtained by PCR using primers based on conserved exon sequence from D. melanogaster (1358-1379F and 2730-2710R from GenBank accession no. K03277). D. melanogaster contains a (CT)n microsatellite (nucleotides 2295-2324) that we found to be conserved and variable in D. pseudoobscura. The D. pseudoobscura sequence of this region has been deposited in GenBank as accession nos. AF039273 and AF039274.
These markers and their physical locations, as well as an additional microsatellite marker, Dps2003, that had been genetically mapped to chromosome 2 by Noor et al. (1999), are presented in Table 1. The marker regions were amplified from 27 partially inbred lines to identify the alleles in the population and to find suitable lines for setting up the mapping crosses. Based on the results of the marker screen, a cross was set up between lines 7 and 21, which are fixed for different alleles at 9 of the 10 markers (all but rp49, which only has two alleles in this sample). Examination of chromosome squashes from 7 × 21 F1 larvae showed no chromosomal inversions, indicating that they share the same third chromosome inversion type. F1 virgin females were held for 1-3 days after eclosion, then placed singly in vials of yeast-glucose food with a single F1 male. F1 parents were removed on the 9th day after female eclosion. All flies were maintained at 20°, on a 12-hr light/12-hr dark cycle. Although 25° is the standard temperature for mapping crosses in D. melanogaster (Ashburner 1989) and has been used for D. pseudoobscura as well (Levine and Levine 1954), ecological and behavioral studies (Dobzhansky and Eppling 1944; Taylor 1986) suggest that 20° may be a more natural temperature for this species (see discussion). We have thus chosen this temperature given our interest in the rate of recombination in natural populations.
DNA was prepared from 192 F2 progeny for scoring each of the nine markers. An additional cross between lines 7 and 51 was set up in the same way to score the location of rp49. Inversion loops (presumably on the third chromosome, which is polymorphic for inversions in this population) were observed in F1 larvae of this cross, raising the possibility that crossing over on the second chromosome may have been somewhat elevated in this cross, due to the interchromosomal effect (Schultz and Redfield 1951). The data were analyzed as an F2 backcross (because there is no recombination in males) using Mapmaker (Landeret al. 1987) with the Kosambi mapping function.
In situ hybridization: Probes for Rh3 and bcd were prepared by biotinylation of the same PCR products used as sequencing templates. Hybridizations to polytene chromosome preparations were performed as described by Lim (1993), using Vectastain reagents. Maps of the polytene chromosomes were from Stocker and Kastritsis (1972).
DNA sequence variation: Approximately 1 to 1.8-kb regions were sequenced in samples of 10-12 inbred lines for seven loci whose physical and genetic locations were known. For five of these loci, one allele was also sequenced from D. miranda. The regions, which were chosen to include as much noncoding sequence as possible, are shown in Table 2. PCR products were sequenced directly using the Thermosequenase cycle sequencing system from Amersham (Arlington Heights, IL), after agarose gel purification using the Qiaex II system (Qiagen, Valencia, CA). Estimates of 4Neμ, π, and θW, were calculated according to Nei (1987) and Watterson (1975), respectively. Throughout the article, θW refers to Watterson’s estimator.
The background selection model: Physical and genetic data generated in this study (Table 3) were used in Equation 15 of the background selection model of Hudson and Kaplan (1995) to make predictions of f0 across the second chromosome. (1 - f0) is the fractional decrease in expected variation due to the effects of background selection. Loci mapped to a polytene section were assumed to be in the middle of the section; e.g., Rh1 was assumed to be at 2.5 sections from the centromere. Gld and Ubx were not included in this analysis because we had no polymorphism data for these genes. Details of the calculations are given in Hamblin and Aquadro (1996).
The ends of the chromosomes were treated two different ways. In the first treatment (Low, Table 5), which leads to lower estimates of f0, no additional recombination at the unmapped ends was included. Dps2003 was assumed to be at the centromere; rp49 was assumed to be at 19.5, with a most distal 0.5 section to the telomere having a recombination rate of zero.
In the second treatment (High, Table 5), we assumed that the unmapped ends of the chromosomes had the same rates of recombination as the adjacent mapped intervals. Dps2003 was assumed to be at 0.5 sections from the centromere, and the most proximal 0.5 section was assumed to have the same genetic length as the interval from Dps2003 to trop1: 3.8 cM. As in the first analysis, rp49 was assumed to be at 19.5, but the most distal 0.5 section was assumed to have the same rate of recombination as the interval from Mlc1 to rp49, namely 3.5 cM/0.5 section.
Genetic map and rates of recombination: Nine molecular markers across the second chromosome were developed based on published genomic sequence. Results of the population survey for these markers are shown in Table 1. The high level of variation at most loci made it possible to score eight of the nine markers, as well as microsatellite Dps2003, in F2 progeny of a single cross. The number of F2 progeny scored was in the range of 185-192 for all markers except Rh1, for which 166 progeny were scored. The last marker, rp49, had only two alleles and was scored in a separate cross, 7 × 51. Because the physical order of the loci was already known, it was necessary to score only one other marker, Mlc1, in cross 7 × 51 to locate rp49 on the genetic map. Due to technical problems, fewer progeny were scored from this cross, and the estimate of genetic distance between Mlc1 and rp49 is based on only 79 progeny.
The genetic map is shown in Figure 1. The order of the markers is consistent with published cytological locations, except in the case of Rh3. Carulli and Hartl (1992) localized all four rhodopsin genes in D. pseudoobscura and reported that Rh3 is in section 53, while Rh1 is in section 45. The genetic analysis places Rh3 proximal to Rh1, closely linked to trop1 in section 44. We performed an in situ hybridization using as a probe a biotinylated PCR product that was also a template for sequencing of Rh3 and found that the probe hybridizes in section 44, which is consistent with the genetic data.
The total length of our genetic map for this chromosome is >128 cM, as compared with the published length of 101 cM based on previously available visible and allozyme markers (Anderson 1990). The physical map of the D. pseudoobscura genome is divided into 100 sections, and published cytological localizations do not have the resolution of the D. melanogaster data. Two of our markers, Gld and Rh1, fall within section 45, and trop1 and Rh3 both fall within section 44. While the genetic data allow us to establish their order, we do not have precise distances between these four closely linked markers. Our most distal marker, rp49, is within the last section (62) of the chromosome, but is not at the very tip. Our most proximal marker, Dps2003, must be within the first section (43), based on its genetic location.
Physical and genetic locations are shown in Table 3 and are presented graphically in Figure 2 with similar data from chromosome arm 3R of D. melanogaster (Lindsley and Zimm 1992) plotted for comparison. The slope of the line in this plot is proportional to the rate of recombination. While these two chromosomal elements contain the same complement of genes and essentially the same amount of DNA (Powell 1997), the D. pseudoobscura second chromosome is genetically almost twice as large as 3R in D. melanogaster. Unlike D. melanogaster, the D. pseudoobscura chromosome lacks any extensive regions where recombination is drastically reduced (keeping in mind that we do not have markers at the extremes of the centromere and telomere). Rather, the rate of recombination across most of the D. pseudoobscura second chromosome is apparently quite uniform, which is similar to chromosome arm 3R in D. mauritiana (Trueet al. 1996).
Levels of DNA sequence variation in a population sample: We surveyed DNA sequence variation at seven of the loci for which we had scored genetic map position; Table 4 summarizes the data. There is a fourfold difference in θW at silent sites between the least variable locus, trop1, and the most variable, Xdh. Estimates of π at silent sites vary about eightfold. For trop1, Rh1, Mlc1, Xdh, and rp49, one allele from D. miranda was sequenced to obtain an estimate of divergence (Table 4). None of these five loci shows a departure from the neutral expectation when compared to each other or to Adh (using the Apple Hill population sample; Schaeffer and Miller 1992a) by the method of Hudson et al. (1987). However, because divergence to D. miranda is quite small (0.9-5.0%), this test has low power.
Estimates of π are lower than estimates of θW for all loci except bcd, as indicated by the negative Tajima’s D (1989a) statistics (Table 4). Tajima’s D’s for Adh (Schaeffer and Miller 1992a), Hsp82 (Wanget al. 1997), and per (Wang and Hey 1996) from D. pseudoobscura are also negative. While only two of the statistics are significantly different from zero (trop1 and Adh noncoding), one would expect that approximately half the statistics would be negative and half positive by chance. We say “approximately” because the distribution of Tajima’s D is slightly skewed toward the negative. The preponderance of negative statistics (P = 0.02 by a sign test that does not take into account the negative skew) suggests that a population-level phenomenon may be responsible.
Fu and Li (1993) tests (D*) are negative for five of the seven loci (all except bcd and Rh1) in our data set. Again, trop1 is the only locus that is significantly different from zero (D* = -2.60, P < 0.02).
Prediction of the effects of background selection: Given the recombinational map described above, we wanted to determine the expected impact of background selection on levels of neutral variation. In the absence of background selection, differences in θ among loci are due only to differences in μ, since Ne is the same across all loci. Background selection, however, causes regional differences in Ne (Charlesworthet al. 1993). To avoid confusion, we use the symbol Ne,0 to represent Ne in the absence of background selection (this can be thought of as species-level effective population size). Locus-specific Ne is related to Ne,0 by the parameter f0, the fraction of variation remaining after background selection: Ne = f0 Ne,0. At any given locus, f0 is a function of (1) the rate of mutation to deleterious alleles (U) per genome; (2) the strength of selection against those alleles in the heterozygous state (sh); and (3) the regional rate of recombination, which determines the size of the region in which deleterious mutations will be linked to the locus.
We calculated values of f0 using the simplified model of Hudson and Kaplan (1995; Equation 15), which assumes that rates of mutation and selective effects are uniform across the genome and that differences in the strength of background selection are solely a consequence of differences in rates of recombination.
Using U = 1 and sh = 0.02, the same values used for D. melanogaster (Charlesworthet al. 1993; Hudson and Kaplan 1995), and the genetic and physical map data in Table 3, we calculated values of f0 for the seven loci in our sequencing survey. (Because the D. pseudoobscura genome is divided into 20 sections, we used a value of 0.01 mutations per section, equivalent to 0.0002 mutations per polytene band in D. melanogaster, per generation.) Our markers cover ∼95% of the physical length of the chromosome, with rates of recombination in the most proximal and most distal 0-5% being unknown. Because of this uncertainty about the ends of the chromosomes, we calculated values of f0 under two alternative scenarios: (1) high, rates of recombination are the same as that in the adjacent segment; (2) low, there is no recombination in the unmapped segments (Table 5; see materials and methods for details). Under the first scenario, f0 is essentially uniform across the chromosome. The second scenario predicts an ∼10% difference between the most and least variable loci. This difference increases to 40% if the selection coefficient is changed to 0.005. Expected values of f0 for these loci in D. melanogaster (from Figure 4 of Hudson and Kaplan 1995) are presented in Table 5 for comparison.
Expected values of f0 are related to expected values of θ by the parameter π0 (the level of variation in the absence of background selection, i.e., 4Ne,0μ), so that f0 × π0 = E (θW). Figure 3 shows the expected values of θ under the four sets of parameters, using an estimate of π0 based on silent (noncoding and synonymous) sites that gives the best fit of the observed data to the model (see below). The high variation observed at Xdh is not predicted by any of the models (but note that divergence at Xdh is 4.1%, more than twice that at other loci; Table 4). Otherwise, the shape of the curve is best predicted by the model with sh = 0.005 and assuming no recombination in the unmapped segments (fourth column of Table 5).
The estimate of π0 was found by performing regression analysis of θW (for all loci except trop1 because of its significant Tajima’s D) on the predictions of f0. Because E (θW)/π0 = f0, the slope of the line θW = m × f0 is an estimate of πo. (We used the “no-intercept” option of Statview, which forces the regression line to pass through the origin.) Separate regressions were performed using θW at total sites, silent sites, or synonymous sites only. Xdh is an outlier in all three data sets, so we also performed the regressions without Xdh, which greatly improved the fit of the data to the model. The results of the analysis, using the model with sh = 0.005 and assuming no recombination in the unmapped segments, are presented in Table 6. For Figure 3, we chose an estimate of π0 based on θW at silent sites because it shows the highest correlation with f0 (r2 = 0.91 vs. r2 = 0.59 and r2 = 0.67 for total sites and synonymous sites, respectively).
Genetic map of chromosome 2: Our immediate purpose in constructing a genetic map was to relate rates of recombination to levels of DNA sequence variation in natural populations (as opposed to providing a framework for identifying genetic loci). It is therefore important to consider whether our map is likely to reflect average rates of recombination in the study population from which our estimates of variation were obtained. There is genetic variation for rates of recombination in D. pseudoobscura (Levine and Levine 1954, 1955), and variables such as temperature, days since eclosion, and the presence of inversions are all known to affect crossover frequencies in D. melanogaster (Ashburner 1989); presumably these variables are important in D. pseudoobscura as well. The Goldendale population used in our study is polymorphic for third chromosome inversions (M. Noor, personal communication), and crossovers on the second chromosome are expected to increase when third chromosomes are heterozygous for inversion type (the Schultz-Redfield effect). Because all but one short interval (Mlc1-rp49) of our map was based on a cross homozygous for third-chromosome type, our inferred rates of recombination may underestimate somewhat the rates for this population in nature.
The temperature that a female D. pseudoobscura is likely to experience during meiosis in the wild is not known. Studies of daily activity found that flies are active at 10° to 31°, but are not usually found at baits during the hotter (>21°) parts of the day (Dobzhansky and Epling 1944). In a laboratory study of temperature choice, D. pseudoobscura preferred 15° over 25° (Taylor 1986). We conducted our mating experiments at 20° rather than at 25°, the standard temperature for D. melanogaster, because we believe this may be closer to the temperature that flies seek in nature. In any case, the effect of temperature within this range is likely to be very small, as significant temperature effects on crossover frequencies in D. melanogaster were observed only at temperatures >29° or <17.5° (Plough 1917).
Our map of chromosome 2, based on two genotypes chosen at random from the population, is almost 30% longer than the published map of Anderson (1990), based on visible and allozyme markers. Some of this difference may be due to the fact that we had information about cytological locations and were able to choose markers covering almost the full physical length of the chromosome. On the other hand, our map is considerably shorter (128 cM vs. 203.9 cM) than another independently constructed map of the second chromosome based on some of the same markers that we used (Nooret al. 1999). Therefore, while we have no reason to think that our map is inaccurate, it is important to realize that genetic variation, polymorphic inversions, and other variables interact to produce a distribution of crossover frequencies in natural populations, of which our map is simply one estimate.
Levels of variation: We analyzed levels of neutral variation at seven loci across the second chromosome of D. pseudoobscura, substantially increasing the number of estimates of sequence variation published for this species. Previous comparisons with sequence data from D. melanogaster have been problematic because estimates of 4Neμ from D. melanogaster come from many different kinds of samples (see Moriyama and Powell 1996), many of which are inappropriate for the type of analysis presented here. In addition, some of the loci surveyed were chosen with an expectation of a departure from neutrality. The most appropriate D. melanogaster data for comparison are those of Kindahl (1994), a collection of randomly chosen autosomal loci all surveyed in the same sample from a single North American population. Kindahl estimated total θW (i.e., an estimate of 4Neμ at all sites, coding and noncoding) on the basis of 4-cutter variation across regions 1.9-4.6 kb in length with an average of 46% coding sequence. This is quite similar to the average of 42% coding sequence in our surveys (Table 4).
Average levels of total variation in the Goldendale population of D. pseudoobscura are ∼1.5-fold higher than in the Maryland population of D. melanogaster (Table 7). Most of this difference comes from the lower end of the range: the least variable locus in D. pseudoobscura is 10-20-fold more variable than the least variable locus in D. melanogaster. The estimate of total θW at Adh in D. pseudoobscura was 0.015 in the most variable population sample, Gundlach-Bundshou (Schaeffer and Miller 1992b), slightly lower than our most variable locus, Xdh.
For a comparison based on synonymous sites in coding sequence, we used estimates from 5 of the loci in this study (all except trop1 and Mlc1, which had 0 and 16.5 synonymous sites, respectively) plus the data for Adh and Adh-Dup in the Apple Hill population (Schaeffer and Miller 1992a). For D. melanogaster, we used estimates from 12 autosomal loci measured in North American population samples (for details, see Table 7). Variation at synonymous sites is ∼2.4-3 times higher in D. pseudoobscura. The greater difference in levels of synonymous variation could be due to higher variation in noncoding regions such as introns, or higher replacement polymorphism, in D. melanogaster. Replacement polymorphism was 12.5% of total variation in coding regions in our surveys in D. pseudoobscura (including Adh and Adh-dup) as compared to 26.4% reported by Moriyama and Powell (1996) for D. melanogaster.
Analysis in the context of regional rates of recombination: Increased overall recombination rate, a lack of substantial suppression of recombination near the centromere, and the reduced size of the linkage group (the acrocentric second chromosome of D. pseudoobscura contains only element E, while the metacentric third chromosome of D. melanogaster contains both elements D and E) all reduce the interaction of selection and linkage in D. pseudoobscura as compared with D. melanogaster (Table 5). The relative levels of silent DNA sequence variation observed for the second chromosome of D. pseudoobscura (20% of the genome) can be fairly well predicted using a background selection model assuming the same average mutational and selective forces as are thought to operate in a North American population of D. melanogaster (Figure 3).
Note that, although we used a model that is formulated to describe background selection against deleterious mutations, any positively selected mutations that have contributed to regional reductions in effective population size will affect the fit of the model to the data. It was not our goal to discriminate between the separate effects of background selection and selective sweeps. Rather, in using the same values for U and sh as were used by Hudson and Kaplan (1995) for D. melanogaster, we were qualitatively testing the hypothesis that the relationship between linkage and the relative level of variation (i.e., f0) were shaped by similar total intensity of selection (both positive and negative) in the two species.
The relatively uniform rates of recombination across the second chromosome of D. pseudoobscura make most of the chromosome fairly insensitive to changes in parameters. It was therefore difficult to discriminate between the alternative models presented in Figure 3, and our qualitative assessment of fit to the models became dependent on the ends of the chromosome where our data were less reliable. It was clear, however, that a stronger, rather than a weaker, effect of selection was needed to explain the reduction in variation observed at both ends of the chromosome. Therefore, unless we assume that the genomic rate of deleterious mutation (U) is higher in D. pseudoobscura, our analyses provide no support for the idea that hitchhiking events have played a larger role in the recent evolutionary history of North American D. melanogaster than D. pseudoobscura.
How likely is it that U for D. pseudoobscura is larger than for D. melanogaster? It is unlikely that replication-based errors occur at a different rate between such closely related species, though densities of transposable elements (TEs), which can contribute to the background selection effect (Charlesworth 1996), can vary considerably. The distributions of TEs have not been studied extensively in D. pseudoobscura, but restriction enzyme surveys of three loci covering a total of 63 kb revealed no length variation of the size associated with TE insertions (Aquadro 1993). A hybridization study by Brookfield et al. (1984) also found few TEs in D. pseudoobscura. Thus there is no evidence that U is larger in D. pseudoobscura than in D. melanogaster.
It has been argued that U in D. melanogaster is considerably smaller, not larger, than 1 (Keightley 1996; Fryet al. 1999), the value that we used in all our analyses. If U were in fact much smaller than 1, the correlation between variation and recombination observed in D. melanogaster could not be accounted for by selection against deleterious mutations, and one would be forced to conclude that positive selection had played a major role in that relationship (Charlesworth 1996). In D. pseudoobscura, however, because most of the recombinational landscape of the second chromosome is quite flat, one would not need to invoke a strong role of positive selection. Rather, a lower species-level effective population size, with f0 close to 1.0 across much of the chromosome, could explain the data. Better empirical estimates of U are needed to resolve this question.
Species-level effective population size: It has been inferred from a small number of restriction-enzyme and sequencing surveys (e.g., Schaefferet al. 1987; Rileyet al. 1989; Schaeffer and Miller 1992a) that D. pseudoobscura has a three- to fourfold larger effective population size than D. melanogaster. Our larger data set of randomly chosen population samples suggests that the difference in levels of polymorphism between the two species may have been slightly overestimated. More importantly, our analysis allows us to estimate the relative contributions of differences in rates of recombination, vs. differences in long-term species-level effective population size, to higher variation in D. pseudoobscura. This can be done by comparing estimates of π0, which directly reflects species-level effective population size, assuming similar neutral mutation rates in the two species.
For D. melanogaster, the estimate of π0 = 0.014 obtained by Hudson and Kaplan (1995) is for total variation. Our estimate of π0 at total sites in D. pseudoobscura is similar: 0.010-0.013 (Table 6). However, a much larger fraction of total variation in D. melanogaster appears to be nonsynonymous or noncoding than in D. pseudoobscura (Table 7). This discrepancy suggests that differences in total variation between the two species may not be a simple function of effective population size (i.e., that a significant fraction of the variation may not be strictly neutral). We analyzed the relationship between recombination and variation for three classes of sites in D. pseudoobscura (Table 6) and found that estimates of silent variation (synonymous plus noncoding) showed the strongest relationship with the predicted effects of background selection, yielding an estimate of π0 = 0.016-0.022 for silent sites, which is not much higher than π0 = 0.014 for total sites in D. melanogaster.
Synonymous sites are the most variable in both species and show the largest difference between the species (Table 7), so they are presumably most likely to accurately reflect differences in effective population size. Using data for seven loci on the third chromosome and the regression method described above (see results), we estimated π0 at synonymous sites in D. melanogaster to be 0.026, a bit lower than the estimate of 0.03 from Hamblin and Aquadro (1997). Our estimate of π0 = 0.020-0.031 for synonymous sites in D. pseudoobscura (Table 6) completely contains the range estimated for D. melanogaster. While these comparisons are very crude, the result is not unreasonable and suggests little difference in species-level effective population size between D. melanogaster and D. pseudoobscura. Note that we assumed U = 1 in both species. If U in D. pseudoobscura were actually <1 (see above), observed variation would be even closer to its maximal level, i.e., species-level effective population size would be smaller.
It is quite plausible that species-level effective population sizes of these two species in North America may be more similar than had been thought. While the ecology of neither species is well understood, there is no evidence from molecular data that D. melanogaster has experienced a severe bottleneck in establishing its North American populations from very large ancestral African populations. In addition, D. melanogaster’s exploitation of abundant agricultural resources certainly provides the opportunity for high population densities.
While species-level effective population size (i.e., Ne,0) may be similar in the two species, molecular evolution at any particular locus will be a function of f0 Ne,0 at that locus, as described above. D. pseudoobscura’s higher rates of recombination should allow for faster, more efficient response to selection. In this light, it is interesting that D. melanogaster, the species with a shorter genetic map than D. simulans and D. mauritiana as well as D. pseudoobscura (Trueet al. 1996), is a more successful colonizer than any of them.
Excess of rare variants: No difference in the amount of selection is required to explain patterns of variation in these two species, in spite of their seemingly very different evolutionary histories. This apparent similarity may be coincidental, obscuring important differences in several underlying parameters, or it may simply reflect the limited resolution of our data. However, it may also reflect an unexpected similarity in biology suggested by the frequency distributions of variation. Our data, together with previously published results (Schaeffer and Miller 1992a; Wang and Hey 1996; Wanget al. 1997) show that Tajima’s D is negative at 9 out of 10 loci in D. pseudoobscura. Negative Tajima’s D statistics can be an indication of rapid population expansion (Tajima 1989a,b; Aris-Brosou and Excoffier 1996).
The possibility that D. pseudoobscura is not at equilibrium has been raised before: Slatkin (1994) pointed out that genetic data provide no evidence for isolation by distance in this species, yet direct estimates of dispersal would predict such an effect. This discrepancy can be explained if populations of D. pseudoobscura have in fact not been relatively stable but instead have recently undergone a range expansion accompanied by dramatic population growth. Such an expansion could be accompanied by adaptation to new environments, possibly comparable to the adaptive changes experienced by D. melanogaster in temperate regions.
A significant change in population size would violate the equilibrium assumption of the background selection model and may affect our analysis in some unknown way. Nonetheless, this reservation probably also applies to North American populations of D. melanogaster, which are thought to be very recently established and may be far from mutation-drift equilibrium for base-pair polymorphisms.
Alternatively, the preponderance of negative Tajima’s D’s may be due to slightly deleterious variants being maintained at low frequencies throughout the D. pseudoobscura genome. At the five loci for which we have surveyed both coding and noncoding regions, there is a trend toward more negative Tajima’s D’s in noncoding regions than at synonymous sites. If this difference were significant in a larger sample, it would support this alternative hypothesis rather than the hypothesis of population expansion.
Patterns of molecular variation across the second chromosome of D. pseudoobscura are consistent with previously published models of the effects of background selection based on data from D. melanogaster. Using these models, the two- to threefold higher levels of silent variation in D. pseudoobscura compared to D. melanogaster appear to be explained by the former species’ twofold longer genetic map and a similar species-level effective population size. Our confidence in this conclusion will be improved by mapping and polymorphism data for more loci and evaluation of how departures from a strictly neutral, equilibrium model of background selection affect parameter estimation. In addition, better estimates of the genomic deleterious mutation rate will permit more accurate inferences about species-level effective population size and the importance of positive selection in shaping genomic patterns of variation in these species.
We thank M. Noor for providing flies, a microsatellite marker, and help with Mapmaker; W. Anderson for the D. miranda stock; M. Veuille, F. Depaulis, and members of the Aquadro lab for helpful discussions; and R. Hudson for comments on the manuscript. This work was supported by a grant from the National Institutes of Health to C.F.A. Some of the writing was done while M.T.H. was supported by a Chateaubriand Fellowship from the French government.
Communicating editor: R. R. Hudson
- Received October 19, 1998.
- Accepted July 1, 1999.
- Copyright © 1999 by the Genetics Society of America