Selection, recombination, and the demographic history of a species can all have profound effects on genomewide patterns of variability. To assess the impact of these forces in the genome of Drosophila miranda, we examine polymorphism and divergence patterns at 62 loci scattered across the genome. In accordance with recent findings in D. melanogaster, we find that noncoding DNA generally evolves more slowly than synonymous sites, that the distribution of polymorphism frequencies in noncoding DNA is significantly skewed toward rare variants relative to synonymous sites, and that long introns evolve significantly slower than short introns or synonymous sites. These observations suggest that most noncoding DNA is functionally constrained and evolving under purifying selection. However, in contrast to findings in the D. melanogaster species group, we find little evidence of adaptive evolution acting on either coding or noncoding sequences in D. miranda. Levels of linkage disequilibrium (LD) in D. miranda are comparable to those observed in D. melanogaster, but vary considerably among chromosomes. These patterns suggest a significantly lower rate of recombination on autosomes, possibly due to the presence of polymorphic autosomal inversions and/or differences in chromosome sizes. All chromosomes show significant departures from the standard neutral model, including too much heterogeneity in synonymous site polymorphism relative to divergence among loci and a general excess of rare synonymous polymorphisms. These departures from neutral equilibrium expectations are discussed in the context of nonequilibrium models of demography and selection.
THE emergence of large-scale multilocus polymorphism data in several species, including Arabidopsis, maize, Drosophila, and humans, is allowing us to test evolutionary hypotheses on an unprecedented scale (e.g., International HapMap Consortium 2003; Andolfatto 2005; Bustamante et al. 2005; Hinds et al. 2005; Nordborg et al. 2005; Ometto et al. 2005; Wright et al. 2005). Most of the Drosophila data are from Drosophila melanogaster, the subject of the first survey of sequence-level variability (Kreitman 1983) and for which we have a wealth of prior information, including an annotated genome sequence (Adams et al. 2000). The ever-increasing wealth of polymorphism and divergence data from Drosophila over the past decade is overturning neutralist views (Kimura 1983) that adaptive evolution is infrequent at the molecular level and that most noncoding DNA evolves essentially neutrally. In particular, multilocus studies have revealed surprisingly high levels of selective constraint in noncoding regions of the Drosophila genome (Bergman and Kreitman 2001; Halligan et al. 2004; Kawahara et al. 2004; Andolfatto 2005; Haddrill et al. 2005a; Halligan and Keightley 2006) and, intriguingly, evidence that a considerable fraction of the divergence between species at both nonsynonymous sites and noncoding DNA was driven to fixation by positive selection (Jenkins et al. 1995; Fay et al. 2002; Smith and Eyre-Walker 2002; Sawyer et al. 2003; Bierne and Eyre-Walker 2004; Kohn et al. 2004; Andolfatto 2005).
Several other polymorphism patterns in the D. melanogaster and D. simulans genomes also suggest genomewide departures from the neutral model, including more linkage disequilibrium (LD) than predicted by comparisons of physical and genetic maps (Andolfatto and Przeworski 2000) and evidence for differences in levels of LD and levels of variability among chromosomes (Begun and Whitley 2000; Andolfatto 2001; Wall et al. 2002; Andolfatto and Wall 2003). In addition, large data sets in D. melanogaster suggest far more than expected heterogeneity in relative levels of polymorphism and divergence across the genome both in recently founded (Glinka et al. 2003; Orengo and Aguade 2004) and in older (Haddrill et al. 2005b) populations. While these patterns have been interpreted in the context of selection models (Hudson et al. 1987; Begun and Whitley 2000; Andolfatto 2001; Orengo and Aguade 2004; Ometto et al. 2005), purely demographic hypotheses have proven difficult to reject as alternative explanations (Wall et al. 2002; Haddrill et al. 2005b; Thornton and Andolfatto 2006).
One approach to making further progress in distinguishing demographic and selective effects on genome variability is either to focus on populations that are likely to have had a more stable demographic history (e.g., Andolfatto and Wall 2003) or to use a comparative approach by investigating genomewide variability patterns in more species. The recent completion of the D. pseudoobscura genome (Richards et al. 2005) makes the species group to which it belongs an attractive model for such studies. Unlike D. melanogaster and D. simulans, species in the D. pseudoobscura group are not human commensals and have likely had a more stable demography. Thus, multilocus data from D. pseudoobscura and its close relatives will complement studies being carried out in the D. melanogaster group and hopefully provide both independent verification and new insights into the underlying causes of many of the interesting departures from neutrality observed in the melanogaster species group.
D. miranda, a close relative of D. pseudoobscura, has already proven a useful model to study incipient sex chromosome evolution (Bachtrog 2005). Here, we analyze sequence polymorphism data from 62 loci in D. miranda (∼78 kb in total) sequenced from 12 lines of D. miranda. The goals of this study are several. First, we investigate whether patterns of evolutionary constraint and adaptive divergence at nonsynonymous and noncoding DNA sites relative to synonymous sites in D. miranda–D. pseudoobscura comparisons resemble those found in comparisons of species in the D. melanogaster group. Second, we estimate levels of linkage disequilibrium from nucleotide polymorphism data at loci surveyed on different chromosomes, to make inferences about patterns of recombination in this species. Third, we use multilocus polymorphism data to test whether D. miranda fits the assumptions of a randomly mating population at neutral equilibrium (hereafter, the standard neutral model, SNM). Previous studies based on fewer loci have suggested that, if excluding individual outliers, patterns of variability in D. miranda generally fit the assumption of the SNM (Yi and Charlesworth 2000; Bachtrog and Charlesworth 2002; Bachtrog 2003a, 2004; Yi et al. 2003; Bartolomé et al. 2005). We show that this conclusion does not hold in a larger D. miranda data set, particularly when accounting for the effect of intragenic recombination. We perform exploratory simulations to assess the fit of some models of selection and demography as putative causes for this departure from the SNM.
MATERIALS AND METHODS
The following D. miranda lines were used for the sequence analyses, with their geographic origin given in parentheses: 0101.3, 0101.4, 0101.5, and 0101.7 (Port Coquitlam, British Columbia, Canada); 0101.9, MA28, and MA32 (Mather, CA); MSH22 and MSH38 (Mount St. Helena, CA); and SP138, SP235, and SP295 (Spray, OR). Flies were cultured on banana medium at 18°.
Polymorphism data for 35 of the 62 loci studied were published previously (see appendix a for references). We collected new sequence data for an additional 27 loci from the neo-X and the X chromosome of D. miranda. Genomic DNA was extracted from a single male fly of each line using the Puregene DNA extraction kit. PCR products were amplified as ∼500- to 2000-bp fragments from genomic DNA, using primer pairs designed from sequenced λ-clones isolated from a D. miranda genomic library (Bachtrog and Charlesworth 2002) or from the D. pseudoobscura genome sequence (Richards et al. 2005). PCR products were used as sequencing templates after treatment with a shrimp alkaline phosphatase/exonuclease I mixture to remove primers and unincorporated nucleotides. Gene-specific internal primers and the original amplification primers were used for sequencing with the BigDye 3.0 cycle sequencing kit (Applied Biosystems, Foster City, CA) following the manufacturer's protocol and sequences were run on an ABI 3730 automated sequencer. We obtained polymorphism data for 29,232 bp of neo-X-linked sequence (from 26 genes) and 928 bp from one gene on the X chromosome in 12 lines of D. miranda (see appendix b). All primers and amplification conditions are available upon request to D. Bachtrog. Sequence trace files were proofread and aligned using Sequencher (Gene Codes, Ann Arbor, MI).
DNA polymorphism and evolutionary analysis:
The estimated number of synonymous sites, nonsynonymous sites, average pairwise diversity (π), and average pairwise divergence (Dxy), as well as counts of the number of polymorphic (P) and divergent (D) sites were performed with Perl scripts written by D. Bachtrog and P. Andolfatto. Multiply hit sites were included in the analysis but insertion–deletion polymorphisms and polymorphic sites overlapping alignment gaps were excluded. We estimate the proportion of divergence driven by positive selection aswhereDi and Pi are the number of divergent and polymorphic variants at locus i, respectively, n is the number of loci, and X and S subscripts denote the putatively selected and neutral (i.e., synonymous) classes of mutations, respectively (see also Rand and Kann 1996; Fay et al. 2002; Smith and Eyre-Walker 2002; Andolfatto 2005). Confidence limits were estimated using a nonparametric bootstrap procedure (Andolfatto 2005). For consistency, we estimated α for nonsynonymous sites the same way and for comparison, we also applied the approach of Bierne and Eyre-Walker (2004) where possible. Estimates and confidence limits based on this procedure were not notably different from estimates based on pooling sites across loci (results not shown).
To characterize levels of linkage disequilibrium, we estimate the parameter ρ = 4Ner, where Ne is the effective population size and r is the recombination rate per generation, by an approximate Bayesian method (Haddrill et al. 2005b; K. Thornton, unpublished data). Posterior distributions of ρ and θ were jointly estimated on the basis of summary statistics of the data (sample size, alignment length, number of segregating sites, number of haplotypes, and the minimum number of recombination events in the sample) and rejection sampling. We estimated ρ and θ on the basis of all sites or silent sites only and, for autosomes, neo-X and X-linked loci independently. Only loci that had a minimum of six segregating sites (S) were included in the analysis, because estimates of ρ using a similar procedure were shown to be highly biased when S is small (Andolfatto and Wall 2003). Thus, a total of 7 X-linked loci (6 loci from XL and 1 locus from XR), 10 autosomal loci (6 loci from chromosome 4 and 4 loci from chromosome 2), and 17 neo-X loci were included in this analysis. The LD analysis was also performed including only loci that had ≥10 polymorphic sites, which gave very similar results (results not shown). Estimates of ρ may also be biased if there is a skew in the allele frequency spectrum of mutations (Andolfatto and Wall 2003). Since all chromosomes show a similar skew in the allele frequency spectrum (see below), this potential bias would be similar for each chromosome. Each estimate of ρ and θ is based on 1000 draws from the posterior distribution.
Statistical tests of neutrality:
Following Haddrill et al. (2005b), we used several multilocus statistical tests to detect nonequilibrium demography and/or selection in our data set. We use the Hudson–Kreitman–Aguadé (HKA) test to quantify heterogeneity in levels of polymorphism relative to divergence among loci (Hudson et al. 1987). We also use the means and variances of two measures of the distribution of polymorphism frequencies [Tajima's D (Tajima 1989a) and Fay and Wu's H (Fay and Wu 2000)]. The ancestral state of polymorphisms was inferred using a single D. pseudoobscura sequence and standard parsimony criteria. We performed these tests using all sites, only “silent” sites (noncoding plus synonymous), and only synonymous sites. Given the evidence for purifying selection acting on nonsynonymous and noncoding polymorphisms (see results), we report results based on synonymous sites only.
All tests were carried out using the neutral coalescent simulation program ms of Hudson (2002) and various auxiliary programs written in C and Perl by P. Andolfatto. In our simulations, we account for sample size, alignment length, and θ for each locus. The parameter θ is estimated from the observed data using the HKA framework on the basis of the number of segregating sites and divergence to a single D. pseudoobscura sequence (Hudson et al. 1987). We incorporate recombination into our simulations by using a point estimate based on the mode of the posterior distribution of ρ/θ estimated for each chromosome. P-values for test statistics are based on 10,000 simulated replicates.
Fit to demographic and hitchhiking models:
We assessed the fit of the neo-X data to two simple demographic models (population growth and a bottleneck) and a recurrent selective sweep model. Population growth and bottlenecks were modeled with the program ms (see http://home.uchicago.edu/∼rhudson1/source/mksamples.html for a guide to implementing such models) and the specific parameters used are listed in the table legends. Under the growth model, N/No = ert, where N is the current population size, No is the ancestral population size, r is the growth rate, and t is the time at which growth began. Population bottlenecks (BN) were modeled as an ancestral population, No, that crashes to size Nb at time t for d generations and recovers to size No. Recurrent hitchhiking (RHH) was modeled using code from Przeworski (2002) and was used for all statistics except the HKA χ2. To investigate the HKA χ2-statistic under the RHH model, we implemented an approximation based on the BN model using the program ms. Here we modeled selective sweeps as locus-specific bottlenecks where the time to the last bottleneck at each locus (t) was chosen from a uniform distribution with mean λ and the severity of the bottleneck (Nb/No) at each locus was chosen from an exponential with mean s. The only statistics used from these simulations were the number of segregating sites (S) and average pairwise divergence (Dxy). We confirmed that the distribution of S for these simulations did not differ from that produced under the RHH model of Przeworski (2002) for a given diversity reduction (results not shown). In each case, simulation parameters were scaled to mimic the observed data (i.e., levels of variability, π, and average pairwise divergence, Dxy) as closely as possible.
RESULTS AND DISCUSSION
Levels of variability in D. miranda:
DNA polymorphism data for a total of 62 loci from autosomes, the neo-X, and the X chromosome were obtained (appendix b), comprising 78 kb sequenced in 12 lines of D. miranda. Average synonymous site diversity πsyn is 0.41% per site for X- and neo-X-linked loci and 0.53% per site for autosomal loci, consistent with the neutral expectation that the X chromosome has three-quarters the variability of autosomes. Thus, there is no evidence of strong sexual selection in D. miranda, which would inflate the X/autosome polymorphism ratio (Charlesworth 2001). Levels of diversity on the X and neo-X are almost sixfold lower than average synonymous site diversity on the X chromosome of D. melanogaster (πsyn = 2.7%, Andolfatto 2005). This difference in synonymous site variability could indicate a lower effective population size of D. miranda relative to D. melanogaster. There is extensive heterogeneity in levels of synonymous site polymorphism and divergence levels among loci (appendix b). In addition, most loci show a marked skew at synonymous polymorphism frequencies toward rare alleles (as measured by Tajima's D-statistic, appendix b). These features of the data are discussed in the context of possible demographic and selection models below.
Positive and negative selection at coding and noncoding DNA:
Overall, synonymous sites evolve faster between D. miranda and D. pseudoobscura than nonsynonymous sites and intergenic regions (Table 1 ); average divergence at fourfold degenerate synonymous sites Ks is 3.6%, significantly higher than divergence at nonsynonymous (Ka = 0.65%) and intergenic (KIG = 2.6%) sites of the genome (P < 0.05, Wilcoxon's two-sample test). Pooling loci on the X and the neo-X, we infer levels of constraint to be ∼30% for intergenic DNA (Table 1). Note that, due to the relatively rough functional annotation of the D. pseudoobscura genome, we do not distinguish between untranslated-transcribed regions (UTRs) and truly intergenic regions. Results from D. melanogaster suggest that constraint is stronger in UTRs than in intergenic regions (Andolfatto 2005). Since most of our “intergenic” regions are relatively close to coding exons, and the average length of the region we surveyed was ∼400 bp, they may be composed of a substantial fraction of UTRs.
In contrast, average divergence observed at introns is similar to that at synonymous sites (Table 1). Note, however, that most of the introns analyzed here are very short (median intron length is 64 bp), and only 13 of the 85 introns investigated are >100 bp. In D. melanogaster, short introns evolve at rates that are similar to those of synonymous sites, which suggests that they experience little or no selective constraint (Halligan et al. 2004; Haddrill et al. 2005a; Halligan and Keightley 2006). However, longer introns in D. melanogaster evolve significantly slower than synonymous sites, suggesting that they are subject to stronger selective constraint (Haddrill et al. 2005a; Halligan and Keightley 2006). To investigate the relationship between intron length and nucleotide divergence, we compiled data from 106 neo-X-linked genes in D. miranda (Bachtrog 2003b, 2005; Bachtrog and Charlesworth 2002; our unpublished data) and compared them to their D. pseudoobscura homolog. This data set contains 165 introns, 21 of which are >100 bp. Figure 1 shows the relationship between nucleotide divergence and intron length. As observed in D. melanogaster (Haddrill et al. 2005a; Halligan and Keightley 2006), synonymous sites evolve at a rate similar to that for short introns (<100 bp). However, long introns (>100 bp) evolve significantly slower than both synonymous sites and short introns (see Figure 1). The 5′- and 3′-flanking regions of genes show similar levels of divergence as long introns, suggesting similar levels of constraint (Figure 1). However, since D. miranda and D. pseudoobscura likely differ by sixfold in population size (Yi et al. 2003), much of the constraint observed may be specific to the D. pseudoobscura lineage. We can rule out this explanation as the sole cause for the divergence pattern because polymorphism at flanking regions in D. miranda is also significantly reduced compared to that at synonymous sites (Wilcoxon's two-sample test, P < 0.01).
Reduced polymorphism and divergence at long introns and intergenic regions relative to synonymous sites could simply reflect differences in mutation rate. One way to distinguish mutation rate differences from selective constraint is to consider the polymorphic site frequency spectrum (SFS) of noncoding DNA relative to synonymous sites (following Andolfatto 2005). Negative selection on sites is expected to result in a skew toward rare polymorphisms relative to neutral sites. In the D. miranda polymorphism data, we detect significantly more low-frequency (f = ) than intermediate- and high-frequency () variants at intergenic DNA compared to synonymous sites (P < 0.01, Fisher's exact test, see Figure 2). The effect of selection on synonymous sites on the SFS has been documented in D. simulans and D. pseudoobscura (Akashi and Schaeffer 1997). However, the SFS in D. miranda (and D. melanogaster—see Andolfatto 2005) suggests stronger selection acting on noncoding DNA sites than on synonymous sites. Consistent with divergence patterns (above), no significant difference is detected at the SFS (P = 0.09, Fisher's exact test, see Figure 2) or at levels of polymorphism (Wilcoxon's two-sample test, P > 0.1) between synonymous sites and the mainly short introns studied here. However, since most intronic DNA resides in long (and thus constrained) introns in the Drosophila genome, this result is consistent with current selection acting on a substantial fraction of noncoding DNA in D. miranda.
Overall, these results corroborate a recent study in D. melanogaster that found that noncoding DNA evolves significantly slower and harbors less polymorphism than nonsynonymous sites and that polymorphic variants at these sites segregate at lower frequencies than synonymous polymorphisms (Andolfatto 2005). Thus, noncoding DNA in Drosophila, including intergenic regions and most intronic DNA, is evolving under stronger functional constraint than synonymous sites. Curiously, the skew in the frequency distribution toward rare variants is not as pronounced for nonsynonymous polymorphisms in D. miranda, particularly on the neo-X chromosome, despite the stronger signature of selective constraint at these sites in levels of polymorphism and divergence (see Table 1).
Interestingly, the D. melanogaster study also suggested that noncoding DNA is undergoing frequent adaptive evolution (Andolfatto 2005). A test to distinguish neutrality from negative and positive selection in the genome is to compare levels of polymorphism within and divergence between species for a putatively selected class of sites to a neutral standard (McDonald and Kreitman 1991). This test was originally designed to detect selection in protein-coding regions, but can be modified to detect selection in noncoding regions as well (Ludwig and Kreitman 1995; Kohn et al. 2004; Andolfatto 2005). If reduced levels of polymorphism and divergence in noncoding and nonsynonymous sites can be explained by a lower mutation rate, the ratio of polymorphism to divergence should be similar to that for synonymous sites. Positive selection will increase divergence relative to polymorphism at selected sites, whereas negative selection may produce the opposite pattern. We note that nonequilibrium mutation models might also result in heterogeneity between polymorphic and diverged mutations (Eyre-Walker 1997; Kern and Begun 2005; Akashi et al. 2006). Future work will be needed to theoretically quantify the magnitude of such effects and to establish patterns of mutational bias in the pseudoobscura group.
The application of the McDonald–Kreitman approach to polymorphism data from D. melanogaster has suggested that a substantial fraction of noncoding and nonsynonymous divergence was driven to fixation by positive selection (i.e., 20–60%; Fay et al. 2002; Smith and Eyre-Walker 2002; Kohn et al. 2004; Andolfatto 2005), if compared to synonymous sites as a neutral standard. Applying this same framework to our polymorphism data from D. miranda, we find no evidence for an excess of divergence for noncoding or nonsynonymous sites (Figure 3). Instead, nonsynonymous and noncoding sites show a slight excess of polymorphism over divergence, which is consistent with purifying selection operating on segregating variation at these sites.
However, the joint effects of positive and negative selection can mask the signature of adaptive evolution in the genome (Charlesworth 1996; Templeton 1996; Fay et al. 2001; Andolfatto 2005). The decrease in statistical power due to negatively selected polymorphisms (that will not contribute to divergence) can be partly overcome by considering only those mutations that are not rare in a sample (so long as the neutral and putatively selected classes are treated equally). In particular, since noncoding DNA shows an excess of rare-frequency variants relative to synonymous DNA, it is likely that some fraction of polymorphism at noncoding DNA is under negative selection. When we apply this approach to our data, there is a slight excess of divergence at nonsynonymous and noncoding DNA relative to synonymous sites (Table 1). The fraction of amino acid mutations driven to fixation by positive selection (α) is estimated to be ∼8%, and 10% for noncoding DNA, although neither estimate is significantly different from zero. Most of the signature of adaptive evolution we detect at nonsynonymous sites is attributable to two loci that show a significant McDonald–Kreitman test individually (CycB and exu1; see Bachtrog and Charlesworth 2002; Bachtrog 2003a); if we exclude these two loci, the remaining genes show no evidence for adaptive protein evolution (α ∼ 0).
While these trends are in the same direction as those found in the D. melanogaster species group (Sawyer et al. 2003; Bierne and Eyre-Walker 2004; Andolfatto 2005), estimates of α are substantially smaller in D. miranda and are not significantly different from zero. Thus, while both noncoding and nonsynonymous DNA show evidence of functional constraint in D. miranda (as in D. melanogaster), neither class shows significant evidence for positive selection.
There are several possible explanations for the differences between D. miranda and D. melanogaster in inferred levels of adaptive divergence. First, the difference between species could be due to differences in the types of genes studied in D. melanogaster and D. miranda. This seems unlikely for several reasons. First, the studies of Bierne and Eyre-Walker (2004) and Andolfatto (2005) yielded similar estimates of adaptive divergence in D. melanogaster despite using a nonoverlapping sample of genes. Like the Andolfatto (2005) study, most of the genes studied here were chosen randomly with regard to protein function. In addition, Bierne and Eyre-Walker (2004) found no evidence for significant heterogeneity in estimates of α among loci, suggesting that the fraction of adaptive divergence is not rampantly different among genes. Given these factors, it seems unlikely that the difference in α is due to gene-specific effects.
A second explanation is based on possible differences in the effective population sizes of the two species. Levels of synonymous site diversity are sixfold lower in D. miranda than in D. melanogaster. Thus the lower fraction of adaptive divergence at both nonsynonymous and noncoding sites in D. miranda could be the result of a smaller effective population size compared to D. melanogaster. If most beneficial mutations fixed in the D. melanogaster species group are of small effect (i.e., Nes ∼ ≤5), these same mutations might actually be effectively neutral in D. miranda. Average Ka/Ks (D. miranda–D. pseudoobscura) among loci is almost identical for the 57 protein-coding genes investigated here in D. miranda and the 35 protein-coding regions analyzed by Andolfatto (2005) in D. melanogaster–D. simulans (Ka/Ks = 0.16 in both species comparisons). Thus, D. miranda does not have a lower rate of amino acid evolution, as might be expected if there is less protein adaptation. However, a smaller Ne in D. miranda would also result in an increase in the rate of fixations of slightly deleterious amino acid mutations (Ohta 1998). Relaxed selection on amino acid variants in D. miranda relative to D. melanogaster is consistent with the lack of a negative skew in the SFS of the former. We cannot, therefore, rule out that these two opposing forces of protein evolution cancel each other out to some extent, causing a similar net rate of protein evolution in D. miranda and D. melanogaster.
One difficulty with a population size argument is that the inferred current Ne of D. miranda may not reflect its historical Ne. Levels of silent-site diversity in D. pseudoobscura suggest it has an even larger effective population size than D. melanogaster (Yi et al. 2003). Thus, the Ne of D. miranda might have been larger than its current size for some time after speciation. Even if it was not, and little adaptive divergence occurred in D. miranda, we expect that substantial adaptive divergence along the D. pseudoobscura lineage should partly mitigate the effects of reduced Ne in D. miranda. Also, the estimate of α for adaptive protein evolution is similar for D. melanogaster and D. simulans (Bierne and Eyre-Walker 2004), despite these two species having putatively different Ne (Akashi 1996; Andolfatto 2001). This suggests that rates of adaptive evolution might not be particularly sensitive to small fluctuations in the effective population size.
Population size arguments typically rely on the assumption that levels of silent-site (i.e., almost neutral) diversity accurately reflect differences in Ne. However, as Gillespie (1997, 1999, 2001) pointed out, if positive selection is frequent, there is little expected correspondence between the effective population size of a species and levels of neutral diversity. Thus, it is possible that the population size difference between D. miranda and D. melanogaster is dramatically underestimated when based on relative levels of synonymous diversity, if genetic hitchhiking is virtually absent in the former and common in the latter. Given uncertainties in relative population sizes and the distribution of selection coefficients, we cannot exclude a lower population size in D. miranda as the primary cause of the lower fraction of adaptive divergence inferred in this species.
A third explanation for the difference between estimates of α in the two species is that nonequilibrium demography in D. melanogaster (Haddrill et al. 2005b) and/or D. miranda (discussed below) may explain the different estimates of α in the two species groups if this demography is producing spurious signatures of adaptation in the former or is masking the signature of adaptation in the latter (Fay and Wu 2001; Eyre-Walker 2002). While segregating deleterious mutations lead to an underestimate of α in a population of stable size (Charlesworth 1996; Templeton 1996; Fay et al. 2002; Andolfatto 2005), they can lead to an overestimate of α if population sizes have expanded, since slightly deleterious mutations that fixed in the past when the population was smaller no longer segregate as polymorphisms (Fay and Wu 2001; Eyre-Walker 2002). On the other hand, a population size contraction would lead to the opposite pattern and thus obscure evidence for adaptive evolution.
Both D. melanogaster (Haddrill et al. 2005b) and D. miranda (discussed below) show evidence for nonequilibrium demography. Rather than expanding, several lines of evidence suggest that D. melanogaster has undergone a recent reduction in its effective size since it last shared a common ancestor with D. simulans (Akashi 1995, 1996). Thus, it may be hard to find a demographic model that could account for the observed positive value of α, especially since estimates of α are similar in D. simulans and D. melanogaster, despite these two species having different demographic histories. A population size reduction in D. miranda that obscures the signature of adaptive evolution may therefore be more likely since it has highly reduced silent-site diversity relative to its sibling species D. pseudoobscura, suggesting a much smaller Ne (Yi et al. 2003). As shown below, many aspects of patterns of polymorphisms in D. miranda are consistent with a severe population size reduction in its recent history.
Clearly, it is puzzling that two different species groups of Drosophila give such different estimates on the importance of adaptive evolution to coding and noncoding DNA divergence. Emerging evidence for complicated demographies in many Drosophila species (Machado et al. 2002; Wall et al. 2002; Haddrill et al. 2005b; Kopp and Barmina 2005; Bachtrog et al. 2006; Baudry et al. 2006) suggests that using comparative approaches to address the question of how common adaptive evolution is may need to involve comparisons of many species. In addition to nonequilibrium demography, nonequilibrium mutation models might also result in discrepancies in estimates of α among species (Eyre-Walker 1997; Kern and Begun 2005; Akashi et al. 2006), although their quantitative effects on estimates of adaptive evolution have yet been little investigated.
Patterns of linkage disequilibrium in D. miranda:
Levels of LD are inversely related to estimates of ρ (the population recombination rate) and it is useful to scale ρ by θ (the population mutation rate) when comparing species, populations, or chromosomes with different effective population sizes (Hudson 1987; Andolfatto and Przeworski 2000; Haddrill et al. 2005b). We thus quantified levels of LD on autosomes, the neo-X, and the X-chromosome as joint estimates of ρ/θ across loci for each of these chromosomes. Posterior distributions for ρ/θ by chromosome are shown in Figure 4, and modes and 95% confidence intervals are listed in Table 2 . Estimates of ρ/θ range from ∼4 to 26 among chromosomes and are on the same order as estimates in D. melanogaster (Thornton and Andolfatto 2006).
Interestingly, we find significant heterogeneity in patterns of linkage disequilibrium among chromosomes. In particular, autosomal loci have significantly more LD (i.e., smaller estimates of ρ/θ) than X-linked or neo-X-linked loci (see Figure 4), suggesting that autosomes have less effective recombination than do X- or neo-X-linked loci. Possible explanations for this difference include population history (Wall et al. 2002), the presence of common polymorphic autosomal inversions (Andolfatto and Wall 2003), or differences in recombination rates. Population history is an unlikely explanation since we expect that the X chromosome and the neo-X would be affected similarly. In fact, the confidence intervals on estimates of ρ/θ barely overlap for these chromosomes despite having similar levels of diversity (Table 2).
Inversion polymorphisms may explain some of the differences observed among chromosomes. Inversion heterozygosity suppresses meiotic recombination between standard and inverted chromosomes, but can enhance recombination levels in other chromosomal regions (Andolfatto et al. 2001). A recent study in D. miranda detected no chromosomal inversions on the left arm of the X chromosome (XL, on which six of the seven X-linked loci used for inferring LD are located), but did detect two polymorphic inversions on chromosome 2 (an autosome); the neo-X chromosome was found to contain one polymorphic inversion (Yi et al. 2003). A similar excess of levels of LD at autosomal relative to X-linked loci was also detected in a Zimbabwe population of D. melanogaster (Andolfatto and Wall 2003). Inversions are both much more common and at higher frequencies on the autosomes than on the X chromosome of D. melanogaster and might thus increase levels of LD at autosomal loci.
Another possible contributing factor may be the effect of chromosome size on recombination rates. In particular, chromosomal arm XL in D. miranda appears to be only about half as long as the other four large chromosomal arms in polytene chromosome preparations (Das et al. 1982). If there is a reasonably good correspondence between size of the polytene chromosomes and sequence physical map, chromosome XL may contain only about half as much DNA as do the other chromosomal arms. In fact, some genes that are located on Muller's element A in D. melanogaster (which corresponds to XL in the pseudoobscura group) map to element D (chromosome XR) in D. pseudoobscura (Segarra et al. 1995). Segarra et al. (1995) concluded that this is probably the result of a pericentric inversion. If this inversion was asymmetric, it could have caused more DNA to be translocated to XR from XL than from XR to XL. This is consistent with the in situ hybridization results of Segarra et al. (1995), who found that genes move only from element A to D, but not the reverse, and could explain the smaller size of chromosomal arm XL in D. miranda. If each chromosomal arm has on average one crossing over event per meiosis, as suggested on the basis of cytological data in Drosophila (Ashburner 1989), this would imply that genes on chromosome XL undergo twice as much recombination per unit length than genes on other chromosomal arms. In fact, the genetic map length is very similar for the two arms of the X chromosome in D. pseudoobscura (Kovacevic and Schaeffer 2000), supporting the hypothesis of more recombination per physical length on the shorter chromosomal arm XL. Thus, both polymorphic inversions and chromosome size might contribute to this observed difference in levels of linkage disequilibrium among chromosomes.
Nonneutral and/or nonequilibrium dynamics in D. miranda:
To test for nonequilibrium demography and/or selection in D. miranda, we applied three multilocus tests of the SNM to all loci for each chromosome. We restricted this analysis to synonymous sites only, since nonsynonymous and noncoding polymorphisms are under negative selection in D. miranda (see above). This analysis reveals several interesting findings (Table 3 ). First, there is significant heterogeneity in levels of polymorphisms and divergence at each chromosome, as indicated by the large observed HKA χ2 across loci. This pattern is seen on all chromosomes even when restricting the data to synonymous sites only and when excluding three loci on the neo-X chromosome that were previously identified as likely targets of recent selective sweeps (Bachtrog and Charlesworth 2002; Bachtrog 2003a). This finding contrasts with previous studies (based on fewer loci) that have suggested (excluding one to two outliers) no significant HKA χ2 across loci in D. miranda (Bachtrog and Charlesworth 2002; Bachtrog 2003a; Yi et al. 2003; Bartolomé et al. 2005). Second, the mean Tajima's D at synonymous sites among loci on the autosomes and the neo-X chromosome is too negative to be compatible with the SNM (Table 3). Again, this is in contrast to previous studies that found no significant evidence of an overall departure of silent variants from neutral expectations of the frequency spectrum (Bachtrog and Charlesworth 2002; Bachtrog 2003a; Yi et al. 2003; Bartolomé et al. 2005). Finally, neo-X-linked genes have too little variance in Tajima's D across loci compared to the SNM (Table 3). These departures from expectations of the SNM in D. miranda suggest the influence of nonequilibrium demography and/or selection.
To investigate the power of nonequilibrium demography and/or selection models to explain various aspects of our data, we performed exploratory simulations. Simple demographic models, including population expansion, population bottlenecks, and a recurrent hitchhiking model were fit to synonymous polymorphism data from the neo-X chromosome (Table 4 ). Simulation parameters were chosen to closely match the observed diversity on the neo-X chromosome (see materials and methods for details). Our simulations assume that synonymous sites are neutral (which is probably a reasonable assumption given patterns of codon usage in D. miranda; Bachtrog 2003b; Bartolomé et al. 2005), and we explore only a limited number of possible demographic and selection models. Thus, keeping their limitations and assumptions in mind, these exploratory simulations are intended as illustrations of what type of population genetics models could in principle account for the observed data.
As indicated by a negative mean Tajima's D, all chromosomes show a marked excess of low-frequency variants across loci (see Table 3). This signature is often interpreted as evidence for population expansion (e.g., Machado et al. 2002; Glinka et al. 2003; Das et al. 2004; Llopart et al. 2005). However, a negative mean Tajima's D is expected under a variety of population genetic models (see Tajima 1989b; Braverman et al. 1995; Charlesworth et al. 1995; Gillespie 1997; Tachida 2000; Haddrill et al. 2005b). Our simulations show that while a population expansion could account for the negative mean Tajima's D observed on the neo-X chromosome (and perhaps the reduced variance of Tajima's D among loci), several other features of the data are clearly incompatible with a simple growth model (Table 4). In particular, growth models are unable to produce the observed heterogeneity in levels of polymorphism and divergence among loci (Pluzhnikov et al. 2002; Haddrill et al. 2005b). In addition, the mean Fay and Wu's H-statistic in a growing population is generally expected to be positive, instead of the negative value observed (Table 4). These results suggest that a simple population expansion model is unlikely to account for the patterns of polymorphism observed on the neo-X chromosome.
Little is known a priori about the demographic history of D. miranda; however, reduced variation and a reduction in the efficacy of selection for codon usage in this species relative to its closest relative (Bachtrog 2003b), D. pseudoobscura, raise the possibility that this species has suffered a drastic reduction in population size relative to its ancestral population (see Bachtrog 2003b; Yi et al. 2003). As an illustration, we show simulation results for two bottleneck models that decrease variation by about twofold and fivefold, respectively, relative to the ancestral population size. While the less severe bottleneck does a poor job of accounting for the observed data, the more severe population bottleneck can account for most aspects of the neo-X data, including the heterogeneity in levels of polymorphisms and divergence among loci and the negative mean Tajima's D and mean Fay and Wu's H (Table 4). However, recent population bottlenecks generally increase the variance of Tajima's D among loci (Table 4, and see Haddrill et al. 2005b), and we failed to find bottleneck parameters that could account for the decreased variance in D relative to the SNM, as observed in the data.
Given previous evidence for positive selection on the neo-X chromosome (Bachtrog and Charlesworth 2002; Bachtrog 2003a), we examined the fit of the data to a commonly used positive selection model (see materials and methods). We found that this model is compatible with several aspects of the data only if selective sweeps are very frequent (Table 4). However, we found it difficult to account for the observed heterogeneity in levels of polymorphism and divergence among loci (Table 4) with a recurrent hitchhiking model. Invoking even more hitchhiking might allow us to account for this aspect of the data; however, it would also result in an even more positive Fay and Wu's H, instead of the negative one observed (Table 4).
Our limited exploration of some selective and demographic models failed to identify a single model that can simultaneously account for all the aspects of the neo-X data. This may indicate that both demographic processes and selection simultaneously influence patterns of molecular evolution of the neo-X chromosome or that our models are misspecified in other ways (e.g., by ignoring purifying selection or population structure). Comparisons among chromosomes reveal that the neo-X chromosome has the most negative Tajima's D, and it is the only chromosome where the variance in Tajima's D is decreased across loci relative to the SNM (Table 3). This could indicate that while nonequilibrium demography is affecting the entire genome of D. miranda, the neo-X chromosome in particular is subject to more frequent adaptive evolution compared to the rest of the genome.
In fact, there are strong a priori reasons for believing that the neo-X chromosome may have been subject to more hitchhiking in the recent past than other chromosomes. The neo-sex chromosomes of D. miranda were an ordinary pair of autosomes until only ∼1 million years ago (Bachtrog and Charlesworth 2002), but are now actively evolving into morphologically and functionally diverged sex chromosomes (Bachtrog 2005). Genes on the neo-Y are male limited, whereas neo-X genes spend two-thirds of their time in females. This raises the possibility that genes undergo adaptive specialization for male and female functions on the neo-sex chromosomes (i.e., genes on the neo-X might become feminized; Rice 1984). In fact, genes showing female-biased expression are more abundant (and genes showing male-biased expression are relatively infrequent) on the X chromosome of D. melanogaster (Parisi et al. 2003). Two genes that are expressed in both testes and ovaries have undergone adaptive protein evolution on the neo-X of D. miranda (Bachtrog and Charlesworth 2002; Bachtrog 2003a), and many other genes might currently evolve sex-related functions on the neo-X. In addition, large parts of the neo-X chromosome of D. miranda are already partially dosage compensated (Bone and Kuroda 1996; Marin et al. 1996). This must have involved the adaptive fixation of some unknown number of de novo binding sites on the neo-X for the dosage compensation machinery (Bone and Kuroda 1996; Marin et al. 1996). If dosage compensation evolves on a small genomic scale, as suggested by recent experiments in D. melanogaster (Fagegaltier and Baker 2004), many such selective sweeps to acquire these binding sites might have happened in the recent evolutionary history of the neo-X chromosome of D. miranda.
It has been noted that X–autosome comparisons for inferring selection are complicated by uncertainties about the demographic history of a species (Charlesworth 2001; Wall et al. 2002). However, life-history differences between males and females, sexual selection, and changes in population size and structure should all influence the X and the neo-X chromosomes similarly. For this reason, it may be informative to parameterize a demographic model on the basis of a large number of X-linked loci for the purpose of identifying outliers, and thus candidates for recent selective sweeps, on the neo-X chromosome. This approach might be particularly amenable to the approaches proposed by Nielsen et al. (2005), which appear to be highly robust to complicated demography. Increasing the number of loci surveyed on the true X will be necessary for such an approach to be feasible.
We are grateful to Brian Charlesworth and David Begun for comments on the manuscript. D.B. was partly supported by a postdoctoral fellowship from the Austrian Academy of Sciences. P.A. was supported by an A. P. Sloan fellowship in molecular and computational biology. This work was funded by a National Institutes of Health grant (GM076007) to D.B.
- Received June 28, 2006.
- Accepted September 14, 2006.
- Copyright © 2006 by the Genetics Society of America