| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Genetics, Vol. 176, 2293-2305, August 2007, Copyright © 2007
doi:10.1534/genetics.107.071191
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
,1
* Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom and
Grupo de Medicina Xenómica, Instituto de Medicina Legal, Universidade de Santiago de Compostela, Santiago de Compostela, 15782, Spain
1 Corresponding author: Instituto de Medicina Legal, Facultade de Medicina, Universidade de Santiago de Compostela, Rúa de San Francisco s/n, 15782 Santiago de Compostela, Spain.
E-mail: xmaside{at}usc.es
| ABSTRACT |
|---|
|
|
|---|
Much recent work on D. americana has been motivated by the wish to advance understanding of the evolution of the centric fusion between Muller's element B (the homolog of the fourth chromosome of D. virilis, a close relative of D. americana), and the X chromosome. This creates a neo-X chromosome, and the X/4 fusion exhibits a north–south clinal pattern of variation, with southern populations essentially lacking the fusion and northern populations being fixed for it (PATTERSON and STONE 1952; THROCKMORTON 1982; CALETKA and MCALLISTER 2004). This provides an important model system for investigating the evolution of chromosomal rearrangements in general and neo-sex chromosomes in particular (MCALLISTER and CHARLESWORTH 1999; MCALLISTER 2002, 2003; VIEIRA et al. 2006).
D. americana is also useful for other types of evolutionary genetic studies, since it is one of the few Drosophila species to have a well-defined ecology, breeding exclusively in wet riparian areas with willow species Salix interior and S. nigra in North America, not associated with humans (THROCKMORTON 1982; B. F. MCALLISTER, personal communication). This means that it has not been affected as much by human disturbance of its habitat as many of the other species used in evolutionary genetic studies. It arrived in North America >3 MYA (CALETKA and MCALLISTER 2004) and appears to have had a relatively stable demographic history, so that investigators can avoid some of the problems of disentangling demography and selection that have been encountered in species such as D. melanogaster and D. simulans (HADDRILL et al. 2005). In accordance with this expectation, our previous study of selection on synonymous and noncoding variants in this species suggested that base composition and codon usage are approximately in equilibrium (MASIDE et al. 2004).
In this article, we extend our analyses of DNA sequence variation in D. americana to nonsynonymous variants at a set of 18 genes. We also combine these with single sequences for each of these genes, and 4 additional genes, from four other species in the virilis group. This provides further information on the phylogeny of these species and enables us to conduct tests for selection on nonsynonymous variants using both codon-based models of sequence evolution (reviewed in YANG and BIELAWSKI 2000) and the McDonald–Kreitman test and its recent extensions (EYRE-WALKER 2006; WELCH 2006). We find an unusually low ratio of nonsynonymous to synonymous variation within D. americana and evidence for an increased intensity of purifying selection against nonsynonymous mutations in both the D. virilis and D. americana lineages, following their split from a common ancestor. There is also evidence for a significantly higher ratio of nonsynonymous to synonymous mutations for interspecies comparisons than for variation within D. americana. This reflects either a high fraction of amino acid substitutions driven by selection or an increased intensity of selection on nonsynonymous but not synonymous variants within the D. americana lineage.
| MATERIALS AND METHODS |
|---|
|
|
|---|
DNA sequences:
Details of the loci analyzed and primers used are given in supplemental Table S1 at http://www.genetics.org/supplemental/. PCR amplification and DNA sequencing methods are described elsewhere (MASIDE et al. 2004).
Polymorphism data were obtained from 5–10 random isofemale lines from the G96 sample population. For two loci, elav and Pros28.1, larger samples were composed of a combination of isofemale lines from three natural populations: G96 (N = 11 and 13, for elav and Pros28.1, respectively), OR01 (N = 19 and 20), and FP99 (N = 20 and 19). G96.11 was randomly selected to be used as the source line for analyses that required the use of a single D. americana sequence.
Nucleotide sequences were edited with Bioedit (v. 5.0.9) and initially aligned with ClustalX, v. 1.81 (THOMPSON et al. 1997). Alignments of exon sequences were corrected by hand by combining nucleotide and predicted amino acid data, and intron sequences were aligned with McAlign2 (KEIGHTLEY and JOHNSON 2004). Some exon sequences included short repetitive regions (usually <300 bp), consisting of tandem repeats of one or two-codon motifs. The historical reconstruction of the changes in the following repeats was ambiguous and they were eliminated from the alignments: dos, poly(QA) between alignment positions 736–813; elav, poly(Q), 72–338; su(Hw), poly(D), 79–123; su(s), poly(G), 364–408; and poly(S), 1042–1137. All sites with gaps in any of the sequences were also eliminated from the alignments before the analysis. The final contents and lengths of the alignments are shown in supplemental Table S1.
DNA sequences of anon66-Db, Cdc37, Cp36, csw, Ddx1, Dos, elav, Fused, kni, msl-3, pros28.1, rh4, sina, su(Hw), su(s), tll, and Yp1 from D. virilis, D. americana, and D. ezoana have previously been published and deposited in GenBank (MASIDE et al. 2004). The sequences newly obtained for this analysis (i.e., the sequences for these loci from D. littoralis and D. montana as well as those of lama, Gpi1, hb, and tube from all species) have been deposited in GenBank under accession nos. EF635062–EF635113.
Gene locations:
The relative positions of su(s), per, elav, Yp1, Cp36, and Fused in the D. americana X chromosome are given by VIEIRA et al. (2006, and references therein). We inferred those of csw, Gpi1, and Pros28.1 by the methods of VIEIRA et al. (2006); i.e., we used the University of California Santa Cruz Genome Browser to identify the D. virilis genome scaffold (July 2004 assembly) that included the genes' sequences, found their position in the D. virilis polytene chromosome map, and determined their correspondence to the D. americana X. Our partial sequence of Pros28.1 matched D. virilis genome coordinates scaffold_48: 270292–271110. This means that it maps to 13C–D in the D. virilis polytene map (GUBENKO and EVGEN'EV 1984), very close to the distal In(X)a breakpoint (Figures 1 and 2 in VIEIRA et al. 2006). Without in situ data, we cannot reliably determine whether Pros28.1 remained opposite elav in conserved block 3 [this chromosome segment was involved in inversions In(X)b and In(X)c; see Figure 2 in VIEIRA et al. 2006] or whether In(X)a relocated it near the centromere along with Fused at the proximal end of block 4. We can, however, confirm that this locus has been repositioned in the D. americana lineage and was classified as potentially affected by the inversions, along with elav, Pros28.1, Yp1, Cp36, and Fused.
|
|
). Other models with lower AICc support, as well as other methods that take into account variable substitution rates at synonymous and nonsynonymous sites, such as the Kumar method (NEI and KUMAR 2000, Chap. 4), produced the same phylogenetic tree. Since intron data were obtained only for a subset of genes, and the sizes of intron sequences varied widely across genes (supplemental Table S1), model selection and tree search were performed using coding sequences alone.
Adaptive evolution:
Numbers of fixed and polymorphic variants in D. americana were corrected for polymorphisms misclassified as fixations using a method described elsewhere (MASIDE et al. 2004). Numbers of synonymous and nonsynonymous substitutions per site were estimated using a codon-based maximum-likelihood approach (Goldman–Yang, GY) (GOLDMAN and YANG 1994), implemented in the CODEML program of the PAML package (YANG 1997). Two different codon substitution maximum-likelihood (ML) models were used: the simplest (null) model assumed that the dN/dS ratio is the same for all branches (one-ratio model), and the alternative model allowed dN/dS ratios to vary freely between branches (free-ratio model). Equilibrium codon frequencies in the models were estimated from the average nucleotide frequencies at the three codon positions (F3 x 4).
Tests of neutrality:
Fu and Li's D and Tajima's D tests were conducted by hand and by using the DnaSP program v4 (ROZAS et al. 2003) and ProSeq v2.9 (FILATOV 2002). Statistical significance of the tests on each locus and on the pooled data set was assessed by coalescent simulations using DnaSP and HKA (available in J. Hey's lab web page: http://lifesci.rutgers.edu/
heylab/HeylabSoftware.htm#HKA). To give equal weight to all loci on the pooled estimates, only five sequences were used to calculate the D statistics at each locus. For loci with sample size N > 5, we drew 100 random samples of five sequences and used the average value of the statistics as an approximation to the expectation. This will make our tests slightly conservative, as these estimates have lower sampling variances than assumed in standard calculations. Estimates of the recombination rates at each locus were obtained using the
-estimator (WALL 2000).
To determine if the free-ratio model fits the data significantly better than the one-ratio model, we used the log-likelihood-ratio test (LRT), assuming that the distribution of the test statistic, 2
L (twice the difference between the log-likelihoods of each model) can be approximated by the
2-distribution with the number of degrees of freedom equal to the difference in the numbers of parameters of the nested models. As only one sequence of each locus in each species was needed, all D. americana sequences used were collected from a single strain selected at random (G96-11). For this analysis, we used one sequence from each locus in the five species, except for su(s) for which we were not able to obtain a sequence in D. montana. Estimates of the divergence values between any two species were calculated by summing the dN and dS values at all branches connecting them.
Since dS estimates obtained with the GY method are influenced by codon usage bias, they may not be appropriate for between-locus comparisons (BIERNE and EYRE-WALKER 2003). Thus, synonymous and nonsynonymous divergences for this purpose were also estimated using the method of NEI and GOJOBORI (1986, Equations 1–3), implemented in DnaSP, which uses a conservative criterion for counting synonymous and nonsynonymous sites (ROZAS et al. 2003); when alternative evolutionary paths are possible, it chooses the paths that involve fewer steps and fewer nonsynonymous mutations (see DnaSP Help file for details). Tests for heterogeneous selection pressure along protein sequences were performed with CODEML, using models M0, M1a, M2a, M7, and M8 (YANG et al. 2000; WONG et al. 2004). CODEML also implements the empirical Bayes method for estimating the category to which each site is most likely to belong (NIELSEN and YANG 1998).
The proportion of nonsynonymous nucleotide substitutions fixed by positive selection (
) was estimated using the methods of FAY et al. (2001) and SMITH and EYRE-WALKER (2002), which are an extension of the McDonald–Kreitman test (MCDONALD and KREITMAN 1991). BIERNE and EYRE-WALKER (2004) have developed a maximum-likelihood estimator of
. In our use of this,
was assumed to be constant across loci. The results are consistent across methods, and only those using the former are shown in detail (for a comparison of the methods see EYRE-WALKER 2006; WELCH 2006).
Reconstruction of ancestral sequences:
Ancestral nucleotides at internal nodes were inferred using the maximum-likelihood approach developed by YANG et al. (1995), implemented in the BASEML program included in PAML. Given the sequence data and the phylogenetic tree, this method uses maximum-likelihood estimates for branch lengths of the tree and parameters from a previously selected substitution model to calculate the posterior probabilities of alternative nucleotide assignments to each interior node at a site (marginal reconstruction) and to select the one with the highest probability. The substitution model used was HKY85 +
. This is a modification of the HKY85 model (HASEGAWA et al. 1985), which allows substitution rates to vary across sites following a discrete gamma distribution with eight categories.
Probabilities of ancestral codons were calculated as the product of the probabilities of ancestral nucleotides at each codon position (AKASHI et al. 2006). When ancestral and derived codons differed by a single nucleotide, the probability of the ancestral codon was taken as the change's count. With differences at two positions, there are two alternative two-step paths between the ancestral and the derived codons, and estimating the numbers of each type of substitutions may not be straightforward (NEI and GOJOBORI 1986). We used a simple method by which we gave the same weight to both evolutionary paths and assumed equal probabilities of all types of changes. For six codons, alternative paths implied different numbers of synonymous and nonsynonymous changes. For a similar data set, AKASHI et al. (2006) showed that, given sufficiently low numbers of substitutions per site, this method gives similar results to those obtained when a correction for synonymous/nonsynonymous ratios is used to weight alternative paths. Codons that had changes at all three positions were excluded from the data set. The lack of a su(s) sequence from D. montana would imply the use of a different phylogenetic tree by the model, and so this locus was excluded from this analysis.
Codon preferences for all species were assumed to follow the D. virilis preferences table (MASIDE et al. 2004). The probabilities of ancestral nucleotides in introns were estimated directly, using the same substitution model (HKY85 +
).
| RESULTS |
|---|
|
|
|---|
Some genes produced alternative trees when analyzed separately, shifting the relations between D. ezoana, D. littoralis, and D. montana, although always with very weak bootstrap support. Only the data from su(Hw) suggested a clustering of D. montana with the ancestor of D. virilis and D. americana, with >85% bootstrap support from different tree inference methods and nucleotide substitution models. Figure 1 shows relatively short branch lengths for these species, with dS along the D. montana branch being at most three times the synonymous diversity within D. americana, i.e.,
12Ne generations [using the equilibrium formula for expected neutral diversity,
= 4Neu, where Ne is the effective population size, u is the mutation rate, and
is the pairwise nucleotide site diversity (NEI 1987)]. With this level of divergence relative to diversity,
20% of the fixations on each branch are expected to be contributed by ancestral polymorphisms rather than fixations of new mutations (CHARLESWORTH et al. 2005); because of linkage, these will tend to occur in clusters (WIUF et al. 2004), so it is likely that this discrepancy for su(Hw) simply reflects the random fixation of neutral polymorphisms present in the common ancestor, which obscures the signal of the species tree.
Genetic diversity in D. americana:
Mean silent-site diversity as measured by
and
w (WATTERSON 1975) is 1.84% (Table 1), a fairly typical value for Drosophila species. The average pairwise nucleotide site diversity at nonsynonymous sites is
45 times lower than at silent sites (mean
= 0.04%, weighting individual values by their estimated net variances, as in BARTOLOMÉ et al. 2005). This ratio is similar to that previously reported from analyses of smaller sets of loci in this species (VIEIRA et al. 2001; MCALLISTER 2003) and is substantially lower than that observed in other Drosophila species, such as the closely related D. virilis [mean
= 0.01 and 0.10, respectively (VIEIRA and CHARLESWORTH 1999)], or the more distant D. miranda [0.08 and 0.42 (BARTOLOMÉ et al. 2005; BARTOLOMÉ and CHARLESWORTH 2006)], D. pseudoobscura [0.2 and 2.3 (LOEWE et al. 2006)], and African populations of D. melanogaster [0.18 and 2.87, respectively (ANDOLFATTO 2005)].
|
is 1.20% for the X chromosome vs. 2.28% for the autosomes (P < 0.0001 on a Mann–Whitney U-test). A similarly low ratio of variance-weighted mean nonsynonymous to silent diversities is observed for both X-linked and autosomal loci when analyzed separately (0.033 ± 0.015 and 0.019 ± 0.0083, respectively; ratio plus or minus standard error of the ratio). Given that the mean silent diversity is typical for a Drosophila species (see previous references), the low nonsynonymous diversity in D. americana probably reflects strong selection against nucleotide substitutions that change amino acid sequences.
The observed higher silent variation at autosomal loci is mainly due to the significant differences observed at synonymous sites, for which the X/A ratio is 0.47 (variance-weighted mean
= 1.14% vs. 2.59% at X-linked and autosomal genes respectively, P < 0.003 on a Mann–Whitney U-test). This difference remains significant even after adjusting the X-linked values by dividing by 0.75 to account for the fact that the effective population size for X chromosomes is three-quarters that of autosomes in the absence of sexual selection (P < 0.01). In contrast, the X/A ratio for the intron diversity is 0.82 (variance-weighted mean
= 1.37 vs. 1.66, P <0.80), close to the 0.75 expected in the absence of sexual selection. These results are consistent with the action of selection at synonymous sites (MASIDE et al. 2004), which is expected to reduce X-linked relative to autosomal diversity, provided that the fitness effects of synonymous substitutions are weakly deleterious and either partially recessive or female specific (MCVEAN and CHARLESWORTH 1999), whereas intron variants are close to what is expected under neutrality.
Tests of neutrality:
To further investigate the signature of selection, we conducted Tajima's D (TAJIMA 1989) and Fu and Li's D (FU and LI 1993) tests on the polymorphism data at each locus and on the pooled data set (see MATERIALS AND METHODS). Synonymous, intron, and nonsynonymous variants were analyzed separately (supplemental Table S2 at http://www.genetics.org/supplemental/). At synonymous sites, Tajima's and Fu and Li's D tests are negative for 13 of the 18 loci. This is significant on a Wilcoxon signed-rank test (TS = 38; P < 0.05) (SOKAL and ROHLF 1995, pp. 440–444). The significance of the observed values of the statistics was determined by means of coalescent simulations that allow for the observed levels of intragenic recombination in each locus (see MATERIALS AND METHODS). According to Fu and Li's D test, the deviations from neutral equilibrium expectations are significant for Cdc37, elav, and su(s) and border on significance for anon-66Db (P = 0.05). For the pooled data, Fu and Li's D = –0.40 (P = 0.10, calculated by assuming that all loci are unlinked but with no recombination within them).
Skews at intron sites tend to go in the same direction but are generally less pronounced and are not significant for any of the loci (supplemental Table S2). When run on the pooled data, Tajima's D = 0.00 and Fu and Li's D = –0.08. At nonsynonymous sites, there is a slight overall surplus of young/low-frequency variants across loci, as expected under the action of purifying selection, but the low levels of polymorphism limit the power of the tests, which are nonsignificant.
Sequence divergence among species:
Maximum-likelihood estimates of genetic divergence at each locus were obtained for the seven branches of the phylogenetic tree (Figure 1). This allowed us to compare the patterns of evolution among genes and branches. To increase the sample, we included sequence data from four additional genes: hb, tube, lama, and Gpi1, for which we have one sequence per species (supplemental Table S1).
Maximum-likelihood estimates of synonymous and nonsynonymous divergence on all branches of the phylogenetic tree were obtained using two models (see MATERIALS AND METHODS). The first assumed a constant dN/dS ratio for the entire tree (one-ratio model), and the second allowed dN/dS ratios to vary among branches (the free-ratio model, YANG 1998; YANG and NIELSEN 1998). This was done for all genes separately and for a sequence composed of a concatenation of all coding sequences from each species. Log-likelihood values under the free-ratio model were always larger than those under the one-ratio model (supplemental Table S3 at http://www.genetics.org/supplemental/). The difference was found to be significant for 3 of 22 genes (Cdc37, kni, and per; not corrected for multiple tests), which indicates that the free-ratio model fits the data better. Consistent with this, the one-ratio model is rejected if the test is run on the concatenated data set (2
L = 25.1, P = 0.0003 on a
; supplemental Table S3).
dN/dS ratios (Figure 1) are largest in the lineages leading to the common ancestor of D. ezoana and D. littoralis (EzoLit) and of D. montana (0.086) and smallest for those of D. americana (0.049) and D. virilis (0.034). The large dN/dS for EzoLit can be attributed to a very small dS (0.015), which is less than one-third of the across-lineage average (0.053; supplemental Table S4 at http://www.genetics.org/supplemental/). However, caution is needed in interpreting this result, as this is a relatively short branch (Figure 1), and dN and dS values may be subject to substantial stochastic variation due to the low number of mutations in it. On the other hand, the D. virilis and D. americana branches are long, so the estimates are relatively reliable. Their smaller dN/dS ratio is associated with a smaller than average nonsynonymous divergence (0.22%), their dS being close to average.
The average dN and dS values are 0.003 and 0.053, respectively, and the average dN/dS is 0.079, so that these loci are under strong purifying selection (supplemental Table S4). elav and sina are the most conserved, with no observed amino acid substitutions. On the other hand, dN/dS values for tube and per are well above the locus average in most lineages. In the case of tube, this result is due to both a higher dN and a lower dS than the average. However, dN for per is 5.5 times higher than the mean (1.81 vs. 0.33%), which is not matched by a proportionally high rate of silent divergence, although this is 1.6 times greater than the mean (dS = 9.4%).
We also constructed various LRTs to check for the presence of significant differences in levels of selective constraint among sites (YANG et al. 2000). The null model (M0) assumes a unique dN/dS for all sites (calculated from the data); M1a assumes that a proportion of sites are highly conserved (0 < dN/dS < 1) and the rest are largely unconstrained, with dN/dS = 1; M2a includes an additional class of positively selected sites (dN/dS > 1). LRTs of the nested models using the concatenated data set show that M1a and M2a fit the data significantly better than M0 (2
L = 151.2, P <0.0001). However, M2a does not make any significant improvement over M1a (2
L = 0.23, P = 0.95), so there is no conclusive evidence for sites evolving under positive selection. Under M1a, 96% of the sites evolve under selective constraints (dN/dS = 0.03) and the other 4% evolve unrestrictedly (dN/dS = 1). Other tests with more elaborate models such as M7 and M8 (YANG et al. 2000) also failed to detect signatures of positive selection at individual sites (such that dN/dS > 1; data not shown). However, these negative results should be regarded with caution, as the power of statistical methods for detecting adapting evolution in coding sequences is greatly reduced with the number of taxa from which data are available and the fraction of positively selected sites (WONG et al. 2004), as is the case for our data set, with sequences from five species and 96% of codons under purifying selection.
McDonald–Kreitman tests:
To characterize the nature of selection further, we used the McDonald–Kreitman test (MCDONALD and KREITMAN 1991), which compares the ratio of polymorphism to divergence (rpd) among synonymous and nonsynonymous sites interspersed along a sequence; under the neutral model, both ratios should be the same. To avoid the possible confounding effects of a substantial contribution of shared ancestral polymorphisms on the divergence between closely related species such as D. americana and D. virilis (CHARLESWORTH et al. 2005), we used D. ezoana as the primary outgroup. As shown in Table 2, rpd is larger among synonymous than among nonsynonymous variants (0.33 vs. 0.12 for the pooled data). We evaluated this difference by means of the Mantel–Haenszel test (SOKAL and ROHLF 1995, Chap. 17), which indicated that there is a 2.3 times larger proportion of polymorphic synonymous substitutions as compared with nonsynonymous changes (
MH = 2.3; XMH = 7.8, P = 0.005; where
MH is the Mantel–Haenszel (MH) estimator of the odds ratios and XMH is the MH statistic, which is distributed as
2 with 1 d.f.). These differences in odds ratios are homogeneous across genes (weighted sum of squares, XMH = 6.6, P = 0.97). Similar results are obtained if D. littoralis or D. montana are used as outgroups for divergence estimates (Table 2).
|
The rate of adaptive evolution:
If we assume that all mutations are either deleterious or neutral, then the ratio of nonsynonymous to synonymous divergence (dN/dS) would be expected to be equal to, or less than, the ratio of nonsynonymous to synonymous polymorphisms (PN/PS). However, if a fraction of nonsynonymous mutations were adaptive and thus driven to fixation by positive selection, then the ratio of the divergences should be greater, given that adaptive nonsynonymous mutations make little contribution to polymorphism. The proportion of nonsynonymous substitutions fixed by positive selection (
) can thus be estimated (CHARLESWORTH 1994), and several methods for doing this have been developed (EYRE-WALKER 2006; WELCH 2006).
Using the method of BIERNE and EYRE-WALKER (2004), we obtained a maximum-likelihood estimate of
between D. americana and D. ezoana of 0.57 (95% C.I.: 0.25–0.77) (Table 3). Estimates of the same magnitude were obtained using D. littoralis and D. montana as outgroups (0.59 and 0.65, respectively). These results are almost 50% higher than those usually obtained for similar sets of data for other Drosophila species (EYRE-WALKER 2006). In contrast, if D. virilis is used as the outgroup, or if fixed changes are counted in the D. americana lineage only, the resulting estimates of
are substantially lower (although the differences are not significant).
|
Table 4 summarizes the estimated numbers of synonymous and nonsynonymous changes in each branch of the phylogeny (Figure 1). Synonymous changes were further classified as from preferred to unpreferred (PU) and vice versa, according to codon preferences established in D. virilis (MASIDE et al. 2004). Assuming equilibrium for codon usage, an equal number of changes in either direction is expected. Departures from equilibrium were quantified as dUP–PU = (UP – PU)/(UP + PU), which has a range of –1 to 1 (AKASHI et al. 2006). The significance of any departures was assessed by means of the Wilcoxon signed-rank test (WSR), which compares UP with PU mutation counts per locus on each lineage. dUP–PU = –0.23 on the pooled data, which suggests that there is a significant excess of PU substitutions (P < 0.005). However, this pattern was not common to all genes (supplemental Table S5 at http://www.genetics.org/supplemental/), as dUP–PU values vary significantly across genes [GH = 60.9, P = 5.1 x 10–6 in a G-test of heterogeneity; e.g. su(Hw) and tll produced the most highly negative dUP–PU values, as opposed to Pros28.1 and Gpi1]. Furthermore, dUP–PU and codon usage bias (measured with Fop, IKEMURA 1985; supplemental Table S1) are negatively correlated (r2 = 0.38, P = 0.05 in a two-tailed Kendall's
-test).
|
|
Departures from base composition equilibrium were quantified by dWS–SW, where W and S denote weak (i.e., A and T) and strong (i.e., G and C) nucleotides, respectively, following the ambiguity code for nucleotides (AKASHI et al. 2006). No significant departures from base composition equilibrium were detected in any lineage or for the pooled data set (Table 6). This result is in good agreement with previous evidence for equilibrium of base composition in D. americana (MASIDE et al. 2004). However, there is significant variation across loci (GH = 44.4, P = 2.1 x 10–5 in a G-test of heterogeneity; supplemental Table S7 at http://www.genetics.org/supplemental/). Tll, Ddx1, Cp36, and Yp1 show the greatest proportion of mutations to AT, and Cdc37 and dos are in the opposite direction. No significant differences were detected between autosomal and X-linked loci (dWS–SW = –0.02 and –0.06, respectively, using data pooled across lineages and loci). We did not detect a significant correlation between dWS–SW and dUP–PU (r2 = 0.22, P = 0.09) or between dWS–SW and Fop (r2 = 0.13, P = 0.13).
|
| DISCUSSION |
|---|
|
|
|---|
2%. As reported in the RESULTS section, mean silent-site diversity is significantly lower for the X chromosome than for the autosomes, suggesting an absence of strong effects of sexual selection on genetic diversity in this species, in contrast to what has been described for African populations of D. melanogaster and D. simulans (ANDOLFATTO 2001) and for D. miranda and D. pseudoobscura (YI et al. 2003; BARTOLOMÉ et al. 2005). At autosomal loci, the variance-weighted mean diversity for synonymous sites (in percent) is larger than that for introns (2.59 ± 0.251 vs. 1.66 ± 0.289; mean ± SE; Mann–Whitney U0.05[10,7] = 53; P = 0.05). This is, however, consistent with the theoretical expectation of an increase in synonymous diversity over neutral values at sites subject to weak selection on codon usage (Nes < 1), if there is mutational bias toward unpreferred codons (see Figure 2 of MCVEAN and CHARLESWORTH 1999). There is evidence that these conditions are met in the case of D. americana (MASIDE et al. 2004). Alternatively, there might be stronger selection on intron sites than synonymous sites, as reported for long introns in D. melanogaster (HADDRILL et al. 2005). But similar levels of average sequence divergence (± SE) at synonymous and intron sites (14.3 ± 1.44% and 12.8 ± 1.66%, respectively, between D. americana and D. ezoana, using the method of estimation implemented in DnaSP, see MATERIALS AND METHODS), together with the lack of evidence for selection at intron sites from the neutrality tests, suggest that this is not the case in our sample of introns, most of which can be considered as short (mean length = 83.6 bp; supplemental Table S1) by D. melanogaster standards.
In contrast with this high level of silent-site diversity, nonsynonymous diversity is very low, and the ratio of nonsynonymous to silent diversity is exceptionally low for a Drosophila species, with a value of 2–3% rather than the more usual value of
10% (see RESULTS section for details). In addition, the results on sequence evolution displayed in Figure 1 suggest that dN/dS is also unusually low along the branches of the phylogeny leading to D. virilis and D. americana, but not in their common ancestor; if anything, this effect is stronger for D. virilis. At first sight, this suggests that there has been an increase in Nes in both species, possibly because of an expansion in their population sizes. However, the limited available data on DNA sequence polymorphism in D. virilis provisionally suggest that silent-site variability is somewhat lower than in D. americana, and the ratio of nonsynonymous to silent diversity is close to the more typical value of 10% (see references in the RESULTS section).
This is difficult to reconcile with the hypothesis of a population expansion in both species as a cause of the low ratio of nonsynonymous to synonymous diversity. In addition, we have previously estimated Nes for mutations from preferred to unpreferred codons in D. americana; it is comparable to that in D. miranda (BARTOLOMÉ et al. 2005), a species with substantially lower silent diversity. This would imply that any increase in Nes has affected selection on nonsynonymous mutations, but not mutations affecting codon usage, which is very unlikely. An alternative hypothesis is that D. americana underwent a recent population bottleneck, causing it to lose low-frequency variants, especially nonsynonymous variants, and has recovered recently. This is, however, inconsistent with its high variability at silent sites and the absence of any signals of change in population size in patterns of noncoding variability (see RESULTS section).
It is, therefore, hard to be sure about what has caused the low nonsynonymous diversity and rate of substitution in D. americana. The possibility of a recent increase in Ne in this lineage raises concerns about the results of the McDonald–Kreitman tests and the estimates of
, the fraction of amino acid substitutions caused by positive selection (see Tables 3 and 4), which mostly used D. ezoana as an outgroup. These
-values are higher than those reported using similar methods for D. melanogaster and D. simulans (BIERNE and EYRE-WALKER 2004; EYRE-WALKER 2006). As discussed by EYRE-WALKER (2002), an artifactually high estimate of
can be caused by population expansion in the species from which polymorphism data are collected, if the nonsynonymous polymorphism/divergence ratio (rpd) is compared with that for a neutral standard. But if the comparison is with rpd for synonymous sites subject to selection on codon usage, the bias will generally be in the opposite direction (underestimation of
), provided that selection on nonsynonymous sites is substantially more intense than that on synonymous sites (EYRE-WALKER 2002). The evidence for weak selection on codon usage provided by MASIDE et al. (2004) suggests that these conditions hold for D. americana.
These considerations imply that the estimate of a high rate of adaptive protein sequence evolution is probably not simply an artifact of population expansion. We cannot, however, unequivocally rule out the hypothesis of an increased intensity of selection against deleterious nonsynonymous, but not synonymous, variants in the D. virilis and D. americana lineages. Further polymorphism data on D. virilis are needed to see whether it, too, has reduced nonsynonymous variability compared with silent variability; the current data on this species (VIEIRA and CHARLESWORTH 1999) are inadequate for a convincing test. If this is not the case, then the hypothesis of increased selection on nonsynonymous sites would be hard to sustain, given that the D. virilis lineage also shows reduced dN (Figure 1).
Possible effects of X chromosome rearrangements on diversity:
We need to consider the possibility that the relatively low silent diversity at D. americana X-linked loci may to some extent reflect the complex recent evolutionary history of its X chromosome since the split from the D. virilis lineage, which includes at least three chromosomal inversions (Figure 2 in VIEIRA et al. 2006), and a fusion with chromosome 4 (PATTERSON and STONE 1952; THROCKMORTON 1982; CALETKA and MCALLISTER 2004). The most recent inversion, In(X)c, is polymorphic and is strongly associated with the X/4 fusion chromosomes (94.6 and 7% of fusion and nonfusion chromosomes, respectively; HSU 1952). The frequency of this arrangement displays a latitudinal cline, being more abundant in northern populations (VIEIRA et al. 2001; MCALLISTER 2002).
Recent analyses have provided evidence for genetic differentiation between X/4 fusion and nonfusion chromosomes and indicate that X/4 fusion chromosomes have relatively reduced silent variability due to suppression of recombination, in the region between the centromere and the basal In(X)c inversion breakpoint (VIEIRA et al. 2001, 2003, 2006; MCALLISTER 2002). The polymorphism data for seven of the nine X-linked genes were obtained from G96, a population sample nearly fixed for the X/4 fusion chromosomes, and the larger samples for the other two genes (elav and Pros28.1) included lines from three natural populations with different proportions of this arrangement (G96, OR01, and FP99, with 98, 86, and 14%, respectively). Thus, the relatively low silent diversity observed could be explained by the inclusion of most X-linked loci in this study in the low-variation region.
To check for this possibility, we identified the chromosomal locations of all X-linked genes for which we obtained polymorphism data (see MATERIALS AND METHODS) and compared the level of silent variation at the loci putatively affected by the chromosome reorganizations in the D. americana lineage with that for loci in the presumably unaffected distal positions. The silent diversity of the potentially affected genes is not significantly different from that of the distal ones (1.6 ± 0.16 vs. 1.3 ± 0.50, respectively; mean ± SE), and the same applies to the ratios of silent-diversity values to the corresponding observed silent divergence between D. americana and D. virilis, to account for different intensities of purifying selection at silent sites (0.12 ± 0.016 vs. 0.15 ± 0.003, for distal and proximal loci, respectively). There is, thus, no evidence for a large-scale reduction of silent diversity on X/4 fusion chromosomes as a result of the recent chromosomal evolution (Figure 2).
We also checked for genetic differentiation between populations at elav and Pros28.1, the two loci with samples from different populations (see above). In agreement with a previous study (VIEIRA et al. 2003), we found no evidence for structure at elav for pairwise comparisons between populations with high (G96 and OR01) and low (FP99) prevalence of the X/4 fusion; KST = –0.003 and –0.006, for G69 vs. FP99 and OR01 vs. FP99 comparisons, respectively; none are significant in a permutation test with 1000 replicates (HUDSON et al. 1992). The same result was obtained if G96 and OR01 are pooled and compared with FP99 (KST = –0.005, NS). On the other hand, at Pros28.1 there is significant differentiation between populations with high and low prevalence of the X/4 fusion (KST = 0.09 and 0.11, P < 0.01; for G69 vs. FP99 and OR01 vs. FP99 pairwise comparisons, respectively), but not between G96 and OR1 (KST = –0.027, NS). Given that genetic differentiation between fusion and nonfusion chromosomal forms seems to be restricted to the base of the X chromosome (VIEIRA et al. 2006), this result could be taken as indirect evidence that Pros28.1 is close to Fused at the basal end of block 4 (Figure 2).
Changes in codon usage and noncoding sequences:
The analysis of patterns of base substitutions at noncoding sites among the different species (Table 6) suggests that base composition at putatively neutral sites is roughly in equilibrium in this group, with AT to GC changes being almost the same in frequency as the reverse. However, the analysis of synonymous changes suggested a deficit of changes from unpreferred codons (U) to preferred codons (P) in the D. montana, D. virilis, and D. americana lineages, at least for autosomal loci (Tables 5 and 6). At statistical equilibrium, we would expect equal numbers of changes in each direction, even with selection operating (AKASHI 1996).
In the latter two cases, this is probably an artifact of the inclusion of polymorphisms among the variants classed as fixed substitutions, since selection against U mutations causes a deficit of UP changes among polymorphisms but not fixations, if codon usage is at equilibrium (AKASHI 1996, 1999). MASIDE et al. (2004) showed that, when a correction for polymorphisms misclassified as fixations was used, there were approximately equal numbers of PU and UP changes in the D. americana lineage, but the use of a single sequence for each gene led to a deficit of UP changes. The negative correlation between dUP–PU and codon usage bias (Fop) that we detected (see RESULTS) is consistent with this interpretation, as stronger selection on codon usage leading to higher Fop can produce a detectable effect on polymorphism statistics for synonymous sites (CUTTER and CHARLESWORTH 2006).
This explanation is unlikely to apply with such force to the relatively long D. montana branch (Figure 1), so that the large deficit of UP changes here may reflect a genuine decline in Nes for synonymous mutations that change codon usage in this lineage. The apparent excess of UP changes in the ancestor of D. virilis and D. americana is hard to interpret, but perhaps indicates an increase in Ne in this lineage that is too small to affect nonsynonymous substitutions. Otherwise, there is no strong evidence for departure from equilibrium with respect to codon usage.
Further insights into the evolution of codon usage are provided by examining the relation between codon usage bias in a gene and its rates of synonymous and nonsynonymous substitution. It has been suggested that codon usage differences may bias the estimates of dS produced by the GY method (BIERNE and EYRE-WALKER 2003). Therefore, for the purpose of among-locus comparisons, synonymous and nonsynonymous divergences (Ks and Ka, respectively) were estimated using the NG method implemented in DnaSP (see MATERIALS AND METHODS). We detected a nonsignificant negative association (Figure 3A) between Ka and Fop. Differences between X-linked and autosomal loci are attributable to the presence of one outlier (per, which has an exceptionally high Ka value) and disappear if this locus is removed from the data. The overall negative correlation without per is then significant (r2 = 0.24, P = 0.025). On the other hand, Ks is not correlated with Fop (Figure 3B). Various evolutionary scenarios, such as interference between selection on nonsynonymous and synonymous sites, selection on codon usage for accuracy of translation, and translational constraints affecting nonsynoymous changes can cause a negative correlation between Ka and Fop (AKASHI 1994; BETANCOURT and PRESGRAVES 2002; KIM 2004; MARAIS et al. 2004).
|
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
| FOOTNOTES |
|---|
| LITERATURE CITED |
|---|
|
|
|---|
AKASHI, H., 1994 Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics 136: 927–935.[Abstract]
AKASHI, H., 1996 Molecular evolution between Drosophila melanogaster and D. simulans: reduced codon bias, faster rates of amino acid substitution, and larger proteins in D. melanogaster. Genetics 144: 1297–1307.[Abstract]
AKASHI, H., 1999 Inferring the fitness effects of DNA mutations from polymorphism and divergence data: statistical power to detect directional selection under stationarity and free recombination. Genetics 151: 221–238.
AKASHI, H., W. Y. KO, S. PIAO, A. JOHN, P. GOEL et al., 2006 Molecular evolution in the Drosophila melanogaster species subgroup: frequent parameter fluctuations on the timescale of molecular divergence. Genetics 172: 1711–1726.
ANDOLFATTO, P., 2001 Contrasting patterns of X-linked and autosomal nucleotide variation in Drosophila melanogaster and Drosophila simulans. Mol. Biol. Evol. 18: 279–290.
ANDOLFATTO, P., 2005 Adaptive evolution of non-coding DNA in Drosophila. Nature 437: 1149–1152.[CrossRef][Medline]
BARTOLOMÉ, C., and B. CHARLESWORTH, 2006 Evolution of amino acid sequences and codon usage on the Drosophila miranda neo-sex chromosomes. Genetics 174: 2033–2044.
BARTOLOMÉ, C., X. MASIDE, S. YI, A. L. GRANT and B. CHARLESWORTH, 2005 Patterns of selection on synonymous and nonsynonymous variants in Drosophila miranda. Genetics 169: 1495–1507.
BETANCOURT, A. J., and D. C. PRESGRAVES, 2002 Linkage limits the power of natural selection in Drosophila. Proc. Natl. Acad. Sci. USA 99: 13616–13620.
BIERNE, N., and A. EYRE-WALKER, 2003 The problem of counting sites in the estimation of the synonymous and nonsynonymous substitution rates: implications for the correlation between the synonymous substitution rate and codon usage bias. Genetics 165: 1587–1597.
BIERNE, N., and A. EYRE-WALKER, 2004 The genomic rate of adaptive amino acid substitution in Drosophila. Mol. Biol. Evol. 21: 1350–1360.
CALETKA, B. C., and B. F. MCALLISTER, 2004 A genealogical view of chromosomal evolution and species delimitation in the Drosophila virilis species subgroup. Mol. Phylogenet. Evol. 33: 664–670.[CrossRef][Medline]
CHARLESWORTH, B., 1994 The effect of background selection against deleterious mutations on weakly selected, linked variants. Genet. Res. 63: 213–227.[Medline]
CHARLESWORTH, B., C. BARTOLOME and V. NOEL, 2005 The detection of shared and ancestral polymorphisms. Genet. Res. 86: 149–157.[CrossRef][Medline]
CUTTER, A. D., and B. CHARLESWORTH, 2006 Selection intensity on preferred codons correlates with overall codon usage bias in Caenorhabditis remanei. Curr. Biol. 16: 2053–2057.[CrossRef][Medline]
EYRE-WALKER, A., 2002 Changing effective population size and the McDonald-Kreitman test. Genetics 162: 2017–2024.
EYRE-WALKER, A., 2006 The genomic rate of adaptive evolution. Trends Ecol. Evol. 21: 569–575.[CrossRef][Medline]
FAY, J. C., G. J. WYCKOFF and C. I. WU, 2001 Positive and negative selection on the human genome. Genetics 158: 1227–1234.
FILATOV, D. A., 2002 ProSeq: a software for preparation and evolutionary analysis of DNA sequence data sets. Mol. Ecol. Notes 2: 621–624.[CrossRef]
FU, Y. X., and W. H. LI, 1993 Statistical tests of neutrality of mutations. Genetics 133: 693–709.[Abstract]
GOLDMAN, N., and Z. YANG, 1994 A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11: 725–736.[Abstract]
GUBENKO, I., and M. B. EVGEN'EV, 1984 Cytological and linkage maps of Drosophila virilis chromosomes. Genetica 65: 127–139.[CrossRef]
HADDRILL, P. R., B. CHARLESWORTH, D. L. HALLIGAN and P. ANDOLFATTO, 2005 Patterns of intron sequence evolution in Drosophila are dependent upon length and GC content. Genome Biol. 6: R67.[CrossRef][Medline]
HASEGAWA, M., H. KISHINO and T. YANO, 1985 Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22: 160–174.[CrossRef][Medline]
HILTON, H., and J. HEY, 1996 DNA sequence variation at the period locus reveals the history of species and speciation events in the Drosophila virilis group. Genetics 144: 1015–1025.[Abstract]
HILTON, H., and J. HEY, 1997 A multilocus view of speciation in the Drosophila virilis species group reveals complex histories and taxonomic conflicts. Genet. Res. 70: 185–194.[CrossRef]
HOIKKALA, A., S. PAALLYSAHO, J. ASPI and J. LUMME, 2000 Localization of genes affecting species differences in male courtship song between Drosophila virilis and D. littoralis. Genet. Res. 75: 37–45.[CrossRef][Medline]
HSU, T. C., 1952 Chromosomal variation and evolution in the virilis group of Drosophila. Univ. Tex. Publ. 5204: 25–72.
HUDSON, R. R., M. SLATKIN and W. P. MADDISON, 1992 Estimation of levels of gene flow from DNA sequence data. Genetics 132: 583–589.[Abstract]
IKEMURA, T., 1985 Codon usage and tRNA content in unicellular and multicellular organisms. Mol. Biol. Evol. 2: 13–34.[Abstract]
KEIGHTLEY, P. D., and T. JOHNSON, 2004 MCALIGN: stochastic alignment of noncoding DNA sequences based on an evolutionary model of sequence evolution. Genome Res. 14: 442–450.
KIM, Y., 2004 Effect of strong directional selection on weakly selected mutations at linked sites: implication for synonymous codon usage. Mol. Biol. Evol. 21: 286–294.
KUMAR, A., K. TAMURA and M. NEI, 2004 MEGA3: Integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief. Bioinformatics 5: 150–163.
LOEWE, L., B. CHARLESWORTH, C. BARTOLOME and V. NOEL, 2006 Estimating selection on nonsynonymous mutations. Genetics 172: 1079–1092.
MARAIS, G., T. DOMAZET-LOSO, D. TAUZT and B. CHARLESWORTH, 2004 Correlated evolution of synonymous and nonsynonymous sites in Drosophila. J. Mol. Evol. 59: 771–779.[CrossRef][Medline]
MASIDE, X., A. W. LEE and B. CHARLESWORTH, 2004 Selection on codon usage in Drosophila americana. Curr. Biol. 14: 150–154.[CrossRef][Medline]
MCALLISTER, B. F., 2002 Chromosomal and allelic variation in Drosophila americana: selective maintenance of a chromosomal cline. Genome 45: 13–21.[Medline]
MCALLISTER, B. F., 2003 Sequence differentiation associated with an inversion on the neo-X chromosome of Drosophila americana. Genetics 16