Genetics, Vol. 157, 295-305, January 2001, Copyright © 2001

Substitution Rates in Drosophila Nuclear Genes: Implications for Translational Selection

Katherine A. Dunna, Joseph P. Bielawskia, and Ziheng Yanga
a Department of Biology, Galton Laboratory, University College, London NW1 2HE, United Kingdom

Corresponding author: Katherine A. Dunn, Department of Biology, University College, 4 Stephenson Way, London NW1 2HE, United Kingdom., katherine.dunn{at}ucl.ac.uk (E-mail)

Communicating editor: J. HEY


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

The relationships between synonymous and nonsynonymous substitution rates and between synonymous rate and codon usage bias are important to our understanding of the roles of mutation and selection in the evolution of Drosophila genes. Previous studies used approximate estimation methods that ignore codon bias. In this study we reexamine those relationships using maximum-likelihood methods to estimate substitution rates, which accommodate the transition/transversion rate bias and codon usage bias. We compiled a sample of homologous DNA sequences at 83 nuclear loci from Drosophila melanogaster and at least one other species of Drosophila. Our analysis was consistent with previous studies in finding that synonymous rates were positively correlated with nonsynonymous rates. Our analysis differed from previous studies, however, in that synonymous rates were unrelated to codon bias. We therefore conducted a simulation study to investigate the differences between approaches. The results suggested that failure to properly account for multiple substitutions at the same site and for biased codon usage by approximate methods can lead to an artifactual correlation between synonymous rate and codon bias. Implications of the results for translational selection are discussed.


SYNONYMOUS substitutions do not affect the amino acid sequence of a protein, yet in some species of Drosophila, Caenorhabditis elegans, Arabidopsis, yeast, and enterobacteria, synonymous substitutions appear to be influenced by natural selection (GROSJEAN and FIERS 1982 Down; SHARP et al. 1986 Down; SHARP and LI 1987 Down; SHIELDS et al. 1988 Down; DURET and MOUCHIROUD 1999 Down). In Drosophila, synonymous codon usage is highly biased toward codons ending with G or C (SHIELDS et al. 1988 Down; POWELL and MORIYAMA 1997 Down; but see RODRIGUEZ-TRELLES et al. 1999 Down), and there is evidence that selection favors substitutions to the preferred synonymous codons and limits substitutions to unpreferred synonymous codons (AKASHI 1994 Down, AKASHI 1995 Down, AKASHI 1997 Down). The exact causes of selection at synonymous sites, however, are less clear.

Selection for preferred codon usage in Drosophila has been suggested to enhance protein translation. Selective enhancement of translation is supported by the observation that the most frequent synonymous codons tend to match the most abundant tRNAs (SHIELDS et al. 1988 Down; MORIYAMA and POWELL 1997B Down). In addition, a perceived negative correlation between silent divergence and codon bias (SHIELDS et al. 1988 Down; SHARP and LI 1989 Down; MORIYAMA and HARTL 1993 Down; POWELL and MORIYAMA 1997 Down) and a strong correlation between gene expression and codon bias (SHIELDS et al. 1988 Down; POWELL and MORIYAMA 1997 Down; DURET and MOUCHIROUD 1999 Down) are consistent with greater selective pressure on the silent sites of genes with high codon bias. Finally, codon bias has been found to be lower in regions of low recombination (KLIMAN and HEY 1994 Down; COMERON et al. 1999 Down). This is expected if codon bias is maintained by selection, because in regions of low recombination the efficacy of selection is reduced due to linkage disequilibrium with other selected sites (HILL and ROBERTSON 1966 Down).

Although selective enhancement of translation appears to be the primary source of codon bias in Drosophila, it is less clear which of several mechanisms are operating. Translation could be enhanced by increasing the rate of elongation, reducing the cost of proofreading, increasing the accuracy of translation, or by any combination of those mechanisms (e.g., AKASHI and EYRE-WALKER 1998 Down). The majority of evidence supports the hypothesis that selection is acting to increase translational accuracy. Preferred codon usage is related to functional constraints, with the most conserved genes and most functionally important amino acid sites exhibiting the most biased codon usage (AKASHI 1994 Down). Furthermore, the synonymous substitution rate is perceived to be positively correlated with the nonsynonymous rate (COMERON and KREITMAN 1998 Down). These patterns indicate that synonymous substitutions are not independent of selective constraints acting on the amino acid composition of the protein. However, some data are inconsistent with the translational accuracy hypothesis. Codon usage bias is lower in longer genes (COMERON et al. 1999 Down; DURET and MOUCHIROUD 1999 Down), but selection to increase accuracy should be greater in longer genes because the cost of proofreading is higher. In addition, codon bias increases within the first few hundred codons and then declines along Drosophila genes (KLIMAN and EYRE-WALKER 1998 Down). Hence the presence of additional selection pressures on codon bias, possibly to enhance translational efficiency (COMERON et al. 1999 Down) or to maintain mRNA secondary structure (e.g., KIRBY et al. 1995 Down), cannot be discounted.

Substitution rates in Drosophila genes contain important information about the effectiveness of selection. The relationship of synonymous substitution rate to codon bias, to nonsynonymous rate, and to location within a gene tells us something about the nature of selection (e.g., KLIMAN and EYRE-WALKER 1998 Down). Previous studies of substitution rates employed approximate methods that neglect the effects of codon bias and apply ad hoc corrections for multiple substitutions. Recent studies suggest that approximate methods can lead to seriously biased estimates of synonymous and nonsynonymous substitution rates when codon usage is biased (INA 1995 Down; YANG and NIELSEN 1998 Down, YANG and NIELSEN 2000 Down; BIELAWSKI et al. 2000 Down). An alternative is to use maximum likelihood (ML). The ML method is based on a Markov process model of codon substitution, which describes the changes between the sense codons in the genetic code and accounts for the transition/transversion rate ratio {kappa}, different equilibrium codon frequencies, and the nonsynonymous/synonymous substitution rate ratio {omega}. Estimates of dS and dN are calculated according to their definitions from the maximum-likelihood estimates of model parameters (such as sequence divergence, {kappa}, and {omega}). The probability theory corrects for unequal codon usage and for multiple substitutions in a straightforward manner. See GOLDMAN and YANG 1994 Down and YANG and NIELSEN 1998 Down, YANG and NIELSEN 2000 Down for details.

The purpose of this study was to use ML methods to estimate substitution rates in Drosophila nuclear genes to investigate the relationship of synonymous rate to nonsynonymous rate and to codon usage bias. We compiled a sample of 83 loci with homologous DNA sequences from Drosophila melanogaster and at least one other species of Drosophila. Our analyses suggested a very different relationship between synonymous rate and codon usage bias as compared with all previous studies. A simulation study was thus conducted to investigate the source of this difference.


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Sequence data:
Sequence data consist of 83 genes from D. melanogaster and at least one of the following species: D. pseudoobscura, D. subobscura, D. simulans, D. yakuba, and D. virilis. A list of these genes and their accession numbers is provided in the TABLE AA. Some analyses were performed on a phylogeny of three species. Such an approach was possible for D. melanogaster, D. pseudoobscura, and D. subobscura (DmDpDsub) for 20 genes and for D. melanogaster, D. simulans, and D. yakuba (DmDsimDy) for 11 genes. Alternatively, larger sets of genes were analyzed in a pairwise fashion, i.e., D. melanogaster vs. D. virilis (DmDv; 39 genes), D. melanogaster vs. D. pseudoobscura (DmDp; 35 genes), and D. melanogaster vs. D. simulans (DmDsim; 24 genes).

Biased patterns of sequence evolution:
G + C content at third codon positions (GC3) and codon usage bias, measured by the effective number of codons (ENC; WRIGHT 1990 Down), were computed for each gene, using the program Codon W of J. Penden. The assumption of homogeneous nucleotide frequencies among Drosophila species was tested using a chi-square test of a contingency table of nucleotide counts.

Estimation of the numbers of synonymous (dS) and nonsynonymous (dN) substitutions per site:
Lineage-specific estimates of dS and dN were obtained from phylogenetic analyses of three species (DmDpDsub and DmDsimDy) using the ML method (YANG and NIELSEN 1998 Down). The assumption of a homogeneous dN/dS ratio among lineages was tested in both datasets by comparing two models. Model 0 assumed the same ratio for all three lineages of Drosophila, whereas model 1 allowed an independent dN/dS ratio for every lineage. Both models account for transition/transversion bias ({kappa}) and unequal codon frequencies, which were determined using the empirical nucleotide frequencies at the three codon positions (F3 x 4 model; YANG 1999 Down). Twice the log-likelihood difference between models 0 and 1 was compared with a {chi}2 distribution with d.f. = (3 - 1) = 2.

Parameters dS and dN were estimated pairwise by ML for genes in the DmDv, DmDp, and DmDsim datasets (GOLDMAN and YANG 1994 Down). For comparison, we estimated dS and dN using two approximate methods: that of NEI and GOJOBORI 1986 Down, referred to as NG, and that of COMERON 1995 Down; the primary difference between them is that the method of COMERON 1995 Down corrects for transition/transversion rate bias while NG does not. The PAML package (YANG 1999 Down) was used to implement NG and ML, while Comeron's program, K-Estimator 5.2, was used for COMERON's (1995) method.

Computer simulations:
To understand the differences between methods, we simulated pairs of codon sequences using the evolver program of the PAML package (YANG 1999 Down). Model parameters used were transition/transversion rate ratio ({kappa}), dN/dS ratio ({omega}), codon frequencies ({pi}j), and sequence divergence (t). The {kappa} value was set to one, and {omega} was the average from genes examined ({omega} = 0.06). We used the empirically estimated codon frequencies from eight Drosophila genes with different ENC values: 53.4 from sc; 49.1 from ade3; 41.2 from v; 44.7 from Gld; 38 from Gad1; 33.7 from Mlc1; 32.3 from Adh; and 28.3 from Amy-p. Each set of codon frequencies was evaluated at a sequence divergence (t; measured by the expected number of nucleotide substitutions per codon) set to 0.4, 0.8, 1.2, 1.6, 2.0, 2.4, and 2.8. Values of t represent the range observed in the sampled Drosophila genes. In total, 56 pairs of sequences were simulated, each 1 million codons in length. dS and dN were estimated for each pair of sequences using both ML and NG methods. The method of COMERON 1995 Down was employed for a few parameter combinations only to examine the effect of transition/transversion rate bias.


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Nucleotide (codon) frequencies in Drosophila nuclear genes:
GC3 varied substantially among the 83 genes, ranging from 28 to 93%. This variation was characteristic of each dataset (DmDpDsub, 44–90%; DmDsimDy, 52–93%; DmDp, 36–90%; DmDv, 45–88%; and DmDsim, 28–91%). Chi-square tests suggested significant heterogeneity in nucleotide composition among lineages for 4 of 20 genes in DmDpDsub (Gapdh2, Gpdh, RpII215, and ry), 8 of 35 genes in DmDp [Gapdh2, Gpdh, l(2)gl, Rh2, RpII215, ry, Tl, and Ubx], and 11 of 39 genes in DmDv [Adh, Amy-p, Cdc37, fu, gbb, Gpdh, Kr, l(2)tid, lama, nos, and Sry-beta].

Consistent with patterns of nucleotide bias, codon usage varied greatly among genes; the ENC values ranged from extreme bias at 26.7 (GstD1) to no bias at 61 (sc). Each dataset exhibited substantial variation in codon bias among genes (DmDpDsub, 27.3–59.8; DmDsimDy, 26.7–61; DmDp, 28.6–56.6; DmDv, 29.2–61; and DmDsim, 26.7–61). Codon bias was not related to heterogeneity in nucleotide composition among lineages. For example, genes with homogeneous nucleotide composition among lineages exhibited both highly biased (e.g., gene Eno, ENC = 28.6) and unbiased (e.g., gene ac, ENC = 56.1) codon usage.

Lineage-specific patterns of synonymous and nonsynonymous substitution:
ML estimation of dN and dS was carried out for each gene in the DmDpDsub and DmDsimDy datasets (Table 1). Constancy of nonsynonymous/synonymous rate ratios (dN/dS) was tested, and dN/dS ratios were largely homogeneous over lineages (Table 1). Homogeneity was rejected in only 3 of the 20 genes in DmDpDsub and 3 of the 11 genes in DmDsimDy. This trend differs from that observed in mammals, where over half of genes surveyed exhibited significant variation in selective constraints among lineages (YANG and NIELSEN 1998 Down; BIELAWSKI et al. 2000 Down).


 
View this table:
In this window
In a new window

 
Table 1. Estimation of numbers of synonymous (dS) and nonsynonymous substitutions (dN) per site for each gene in the three-species datasets

Genes exhibiting high synonymous or nonsynonymous rates in one lineage tended to exhibit high rates in other lineages as well. For example, the coefficient of determination (r2) of dS estimates between Dm and Dp is r2 = 0.5098 (P = 0.0004), and the coefficient of determination of dN is r2 = 0.5005 (P = 0.0005). Similar patterns were reported for mammalian genes (BULMER et al. 1991 Down; MOUCHIROUD et al. 1995 Down). Substantial variation in dN/dS was found among genes, as they are under very different selective constraints (Table 1).

The relationship between synonymous and nonsynonymous substitution rates was evaluated by linear correlation using rate estimates of Table 1 for five species. Estimates of dS were positively correlated with dN in every lineage (Dp, r2 = 0.4643, P = 0.0009; Dsub, r2 = 0.7312, P << 0.0001; Dsim, r2 = 0.5722, P = 0.0071; Dy, r2 = 0.3945, P = 0.0385; Dm, r2 = 0.4587, P << 0.0001). The plot for D. pseudoobscura is presented as an example (Fig 1A). These results are consistent with a previous analysis of substitution rates in Drosophila (COMERON and KREITMAN 1998 Down). However, this study represents a wider sample of Drosophila species and the correlations reported here are stronger than previously reported.



View larger version (19K):
In this window
In a new window
Download PPT slide
 
Figure 1. The relationship between dN and dS (a) and between dS and ENC (b) for Drosophila pseudoobscura. Estimates of dS and dN were obtained from lineage-specific analyses.

Synonymous substitution rates were independent of codon bias, as evaluated by linear regression of dS and ENC, in every lineage (Dp, r2 = 0.0004, P = 0.7911; Dsub, r2 = 0.0138, P = 0.6218; Dsim, r2 = 0.151, P = 0.2376; Dy, r2 = 0.0045, P = 0.8446; Dm, r2 = 0.0079, P = 0.6405). The plot for D. pseudoobscura is presented as an example (Fig 1B). All previous analyses differ from ours in suggesting synonymous substitution rates are negatively correlated with codon bias (SHIELDS et al. 1988 Down; SHARP and LI 1989 Down; MORIYAMA and HARTL 1993 Down; POWELL and MORIYAMA 1997 Down). Previous studies, however, also differ from ours in using approximate estimation methods, which ignore codon usage bias. Furthermore, in this study we estimated dS per lineage rather than between pairs of sequences. In the next section, we explore what factors may be responsible for this conflict.

Reconciling differences between ML and approximate methods:
Models employed in ML estimation of substitution rates explicitly assumed that nucleotide/codon frequencies were at equilibrium (GOLDMAN and YANG 1994 Down). To see whether violation of this assumption influenced our correlation analysis, we reevaluated the relationship between dS and ENC using only stationary genes. To keep more genes in the dataset and to facilitate direct comparison with approximate methods, we apply the ML method to two-taxa datasets. This pairwise ML analysis led to the same conclusion as the phylogeny-based ML analyses; i.e., synonymous rates were independent of codon bias (Fig 2). However, when either NG or COMERON's (1995) method was used, dS was negatively correlated with ENC for both the DmDv (NG, r2 = 0.5036, P << 0.0001; Comeron, r2 = 0.4615, P = 0.0001) and DmDp (NG, r2 = 0.6754, P << 0.0001; Comeron, r2 = 0.6127, P << 0.0001) datasets (Fig 2), a result consistent with previous studies (SHARP and LI 1989 Down; COMERON and AGUADE 1996 Down; POWELL and MORIYAMA 1997 Down). However, in the DmDsim dataset, the approximate methods lead to the same conclusion as the ML method; dS was independent of ENC (NG, r2 = 0.0995, P = 0.1332; Comeron, r2 = 0.0583, P = 0.2557; Fig 2). Because results were similar for the two approximate methods, only results for NG are presented in Fig 2. On the basis of these results we conclude that the conflict between approximate methods and ML cannot be attributed to use of a pairwise approach, the sample of genes, or nonstationarity of nucleotide frequencies.



View larger version (28K):
In this window
In a new window
Download PPT slide
 
Figure 2. The relationship between pairwise estimates of dS and mean ENC (a–f). Pairwise estimates of dS were computed using both ML (GOLDMAN and YANG 1994 Down) and NG (NEI and GOJOBORI 1986 Down). DmDv indicates pairwise comparisons between D. melanogaster and D. virilis; DmDp, pairwise comparisons between D. melanogaster and D. pseudoobscura; and DmDsim, pairwise comparisons between D. melanogaster and D. simulans.

To examine the effect of transition/transversion bias and codon bias on estimation of dS, we changed parameters of the codon model to reanalyze the two-taxa datasets by ML. The effect of ignoring transition/transversion bias was evaluated by setting {kappa} = 1, but allowing for biased codon usage. Results obtained using this codon model did not differ from those obtained using the full codon model; i.e., there was no significant relationship between dS and ENC for DmDv (r2 = 0.140, P = 0.060) and DmDp (r2 = 0.051, P = 0.256). The effect of ignoring codon bias was evaluated by assuming equal codon frequencies, but allowing for transition/transversion bias. This codon model produced results different from those obtained previously and matched the approximate methods, with a significant correlation between dS and ENC in DmDv (r2 = 0.435, P = 0.0005) and DmDp (r2 = 0.6281, P << 0.0001). These findings suggest that the different treatment of codon bias was responsible for the conflict between ML and approximate methods in the relationship between dS and ENC.

However, in one pairwise comparison (DmDsim), approximate and ML methods were in agreement, yet codon usage in this dataset (mean ENC = 43) was just as biased as in the other two sets of genes (DmDv, mean ENC = 44; DmDp, mean ENC = 41) where approximate and ML methods differed. Hence, there must have been an additional factor. In both the DmDv and DmDp datasets, where approximate methods produced a positive relationship between dS and ENC, uncorrected percent sequence divergences (p) were very large (DmDv, mean p = 21.6 ± 6.0%; DmDp, mean p = 16.3 ± 5.8%). In the DmDsim dataset, where dS was independent of ENC, divergences were very low (mean p = 3.2 ± 1.4%). This pattern suggested a possible "saturation" effect. Consequently, we hypothesized that differences between approximate and ML methods might be related to the combined effects of sequence divergence and codon usage bias.

Simulation studies:
Simulations were performed to evaluate the hypothesis that improper treatment of both divergent sequences and codon usage bias by approximate methods could have led to a significant positive correlation between dS and ENC when, in reality, they were independent. Codon sequences were simulated under values of t chosen to reflect the range exhibited among Drosophila genes, and codon frequencies were modeled after observed frequencies of Drosophila genes (see MATERIALS AND METHODS). Simulated codon sequences were analyzed using both NG and ML (F3 x 4), and these results were compared with the true values employed in the simulation (Fig 3).



View larger version (31K):
In this window
In a new window
Download PPT slide
 
Figure 3. Estimates of dS in simulated data by NG ({circ}) and ML (F3 x 4; {triangleup}) plotted against sequence divergence t (the expected number of nucleotide substitutions per codon). Data for a pair of sequences, each of 1 million codons, were simulated for different codon bias measured by ENC.

Both sequence divergence and codon bias had an effect on estimates of dS (Fig 3). Only at low sequence divergences (t < 1.6) and relatively unbiased codon usage (ENC = 53) did both NG and ML (F3 x 4) produce unbiased estimation of dS. With one exception (Fig 3A), dS was underestimated by both methods with increasing levels of sequence divergence (Fig 3, b–h), and the NG method involved a much more serious estimation bias than ML. Because equilibrium codon frequencies obtained using the F3 x 4 model (9 frequency parameters) do not perfectly match the empirical Drosophila frequencies, some degree of error was expected for ML. Use of the more parameter-rich model (60 frequency parameters) produced estimates of dS essentially identical to the true values (data not shown). Differences between the analysis of real data using the NG and ML (F3 x 4) methods are consistent with differences observed in our simulation study; i.e., ML estimates of dS were larger than estimates obtained using NG. These simulations further suggested that the ML (F3 x 4) estimates of dS for Drosophila genes are, themselves, likely to be underestimates of the true values. Although the F3 x 4 model is biased in cases of extreme codon bias, this model was acceptable over a wide range of codon biases and consistently outperformed the NG method.

Plots of dS (Fig 3) are reminiscent of the "saturation effect" on plots of uncorrected sequence divergence. For dS, however, the "ceiling" appears to be related to levels of codon usage bias; the effect is relatively minor when codon usage is unbiased and extreme when codon usage is highly biased (Fig 3). For example, in the most extreme case of codon bias (ENC = 28), the estimate of dS obtained using the NG method peaked at 0.5 (t > 0.8), although true values of dS range from 0.95 when t = 0.4 to 6.6 when t = 2.8. The only exception to the general pattern occurred when codon usage was relatively unbiased (Fig 3A). In this case, when 1.2 < t < 2.0, NG overestimated rather than underestimated dS, and when t > 2.0 the method yielded invalid estimates of dS. Interestingly, MUSE 1996 Down suggested that even when there is no codon usage bias, ad hoc corrections for multiple substitutions applied in the approximate methods could introduce substantial bias.

The dramatic effect of codon usage bias on estimation of dS is caused mainly by its effect on counting of synonymous (S) and nonsynonymous (N) sites. When codon usage is biased but ignored (Fig 3, b–h), the number of synonymous sites (S) was overestimated, leading to an underestimation of dS. For example, in the most extreme case (ENC = 28, t = 2.8), most codons end with C or G and most changes at the third position are transversions between C and G, which are more likely to be nonsynonymous than random changes between nucleotides. As a result, the proportion of synonymous sites is as low as S = 8.53%. The NG method assumes unbiased codon usage or equal rates of change between nucleotides and expects more frequent transitional or synonymous changes, giving S = 23.6%, with almost a fourfold difference. When there is little codon bias (ENC = 53), NG reliably estimated S (NG, S = 23.4%; true value, S = 23.3%). Thus the underestimation of dS by NG is more serious for more extreme codon bias (Fig 3). In Fig 4A, the NG estimates of dS corresponding to different sequence divergences (t) from the simulation (Fig 3) were plotted against ENC. Although dS and ENC are independent, NG incorrectly indicated a positive correlation, due to the underestimation of dS at high divergences and extreme codon bias. The ML method (not plotted) indicated no correlation between dS and ENC (r2 = 0.00006, P = 0.95). The pattern seen in the simulated data is consistent with that in the real data. This is most clear when NG estimates of dS from the real data (DmDp, Fig 2D) were superimposed onto the approximate estimates from the simulations (Fig 4B). These data suggest that the significantly positive correlation between dS and ENC indicated by the approximate methods is an artifact of these methods' failure to properly account for the combined effects of codon bias and sequence divergence.



View larger version (16K):
In this window
In a new window
Download PPT slide
 
Figure 4. (a) Approximate estimates of dS by NG obtained in the simulation (Fig 3) plotted against ENC. Multiple estimates of dS for each ENC correspond to the seven values of t. Note the true dS is independent of ENC, but a positive correlation is suggested by NG. (b) The same data as in a but with NG estimates from the pairwise comparisons between Drosophila melanogaster and D. pseudoobscura superimposed ({circ}).


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Codon usage in Drosophila varies considerably between genes and does not appear random. Highly conserved genes, and functionally important sites, exhibit highly biased synonymous codon usage (AKASHI 1994 Down). AKASHI 1994 Down theorized that these patterns resulted from a balance between selection for translational accuracy, mutation, and drift in finite populations. On the basis of this model, he predicted that synonymous substitution rates should be positively correlated with nonsynonymous substitution rates. However, he did not find a significant correlation. More recently, COMERON and KREITMAN 1998 Down reported that when data from three species of Drosophila are combined, synonymous rates are positively correlated with nonsynonymous substitution rates. Our findings confirm those of COMERON and KREITMAN 1998 Down and further indicate that this correlation is a feature of all five species of Drosophila. In addition, our analysis implies that the correlation between synonymous and nonsynonymous rates is stronger than previously thought, suggesting that synonymous substitution rates in these Drosophila species are not independent of selective constraints acting at the amino acid level. Consistent with patterns of codon usage (AKASHI 1994 Down), these data support the hypothesis that codon bias in Drosophila is influenced by selection for translational accuracy.

However, our findings differ from all previous studies (SHIELDS et al. 1988 Down; SHARP and LI 1989 Down; MORIYAMA and GOJOBORI 1992 Down; MORIYAMA and HARTL 1993 Down; CARULLI et al. 1993 Down; POWELL and MORIYAMA 1997 Down) in suggesting that synonymous substitution rates are not correlated with codon bias. Through ML analysis under different models as well as computer simulation, we show that previous reports of a significant correlation between dS and ENC might be an artifact of inadequate correction for the combined effects of saturation and biased codon usage. However, POWELL and MORIYAMA 1997 Down suggested that artifacts of inadequate correction for codon bias could not be responsible for a correlation between dS and ENC because in their study they employed a method (MORIYAMA and POWELL 1997A Down) that was claimed to correct for codon bias. However, this method corrects for base composition bias only when correcting for multiple substitutions and assumed unbiased codon usage in the important steps of counting sites and differences (YANG and NIELSEN 2000 Down). Because the major source of bias in dS and dN arises from biased estimates of S and N (YANG and NIELSEN 2000 Down; see also above), the method of MORIYAMA and POWELL 1997A Down does not correct effectively for codon usage bias.

The effect of saturation is still evident in the analysis of POWELL and MORIYAMA 1997 Down. They observed a very strong correlation between dS and codon usage bias in distantly related species pairs, but a much weaker correlation in a more recently diverged pair of species, D. melanogaster and D. simulans. POWELL and MORIYAMA 1997 Down dismissed the weak correlation as an effect of sampling errors at small sequence divergences and possible shared polymorphisms. Our simulation study suggests an opposite conclusion: that is, the weak correlation is closer to the true relationship, while the strong correlation is an artifact; i.e., because D. melanogaster and D. simulans have a more recent divergence, this species pair provided the least biased estimate of the relationship between dS and codon bias when approximate methods are used.

Our analysis suggests that the discrepancy between approximate and ML methods was caused mainly by codon usage bias and not by the transition/transversion bias (see also YANG and NIELSEN 1998 Down, YANG and NIELSEN 2000 Down; BIELAWSKI et al. 2000 Down). The approximate methods of NEI and GOJOBORI 1986 Down and COMERON 1995 Down led to the same conclusion. The ML method both with and without accounting for the transition/transversion bias led to the same conclusion, which is different from that of the approximate methods. To test this notion further, a small simulation study was conducted using the method of COMERON 1995 Down, which corrects for transition/transversion bias. The transition/transversion ratio was fixed at {kappa} = 2, and three levels of codon bias (ENC = 49, 38, and 28.31) were used. Simulated sequences comprised only 30,000 codons as the program of Comeron did not handle very long sequences. Results, shown in Fig 5, indicate that simply accounting for the transition/transversion bias (i.e., the method of COMERON 1995 Down) does not reduce the bias in estimates of synonymous substitution rates. In fact, at moderate codon bias (ENC = 49), NG is closer to the real value of dS than the method of COMERON 1995 Down. This is because the bias introduced by ignoring codon usage is in the opposite direction to that produced by ignoring the transition/transversion ratio, and the two biases partially cancel out in NG. This pattern was discussed by YANG and NIELSEN 2000 Down. However, with greater codon bias (i.e., ENC > 49), the effect of ignoring codon bias becomes much larger than ignoring transition bias, hence both NG and COMERON's (1995) methods yield nearly identical estimates of dS.



View larger version (14K):
In this window
In a new window
Download PPT slide
 
Figure 5. Estimates of dS in simulated data by NG ({circ}), ML (F3 x 4; {triangleup}), and COMERON 1995 Down (x) plotted against sequence divergence t (the expected number of nucleotide substitutions per codon). ({square}) Real values of dS. Data for a pair of sequences, each of 30,000 codons, were simulated under {kappa} = 2 and three different levels of codon bias (a–c).

Given the evidence for selection for translational accuracy in Drosophila, our finding of lack of correlation between synonymous rate and codon usage bias is puzzling. Models of translational selection predict that selection against substitutions to unpreferred codons at functionally important amino acid sites will result in a positive correlation between dS and ENC (SHIELDS et al. 1988 Down; SHARP and LI 1989 Down; MORIYAMA and HARTL 1993 Down; POWELL and MORIYAMA 1997 Down). However, this prediction assumes that mutation pressure does not contribute to codon bias. KLIMAN and HEY 1994 Down found a correlation between base composition of third codon positions and introns, and they suggested that at least 10% of variation in codon bias can be explained by mutation pressure. Variable mutation pressures can lead to a negative, rather than positive, correlation between dS and ENC (BIELAWSKI et al. 2000 Down). If mutation pressure produces a negative relationship between dS and ENC in Drosophila, it might confound the positive relationship between dS and ENC expected under translational accuracy.

COMERON and KREITMAN 1998 Down found that, among the majority of codons having double nucleotide substitutions in Drosophila, synonymous substitutions represented a shift to an unpreferred rather than a preferred codon, a pattern inconsistent with positive codon selection. COMERON and KREITMAN 1998 Down hypothesized that the explanation for this pattern was a relaxation of selective constraints at these sites. They suggested a covarion-like model whereby the effect of selection on individual codons, for translational accuracy, changes over time. In this model, any relaxation of constraints at an amino acid site also includes relaxation of constraints on synonymous codon usage at the same site. Fluctuations in selective constraints could obscure the relationship between dS and ENC predicted by previous models of translational selection because dS is measured over time and ENC is measured at one point in time. In genes having a large proportion of sites where selective constraints have changed over time, substitution rates will be higher as compared to genes having a smaller fraction of sites subject to changing selective pressures. At any one point in time these genes could have similar values of ENC. Interestingly, we observed considerable variation in dS among genes with similar values of ENC. Moreover, the covarion-like model of COMERON and KREITMAN 1998 Down predicts a positive correlation between synonymous and nonsynonymous substitution rates, which we also observed among Drosophila.

Whatever the causes of codon bias in Drosophila, the results of this study, and others (KLIMAN and HEY 1994 Down; COMERON and KREITMAN 1998 Down; KLIMAN and EYRE-WALKER 1998 Down; COMERON et al. 1999 Down; DURET and MOUCHIROUD 1999 Down; MCVEAN and VIEIRA 1999 Down), indicate that the origins of codon bias in Drosophila are more complex than previously thought. Unbiased estimation of substitution rates will be a crucial aspect of unraveling the origins of codon bias in Drosophila.


*  ACKNOWLEDGMENTS

We are grateful for the comments of two anonymous reviewers. This study was supported by a Biotechnology and Biological Sciences Research Council grant (31/G10434) to Z.Y.

Manuscript received July 11, 2000; Accepted for publication October 12, 2000.


*  APPENDIX
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED


 
View this table:
In this window
In a new window

 
Drosophilia genes and accession numbers


*  LITERATURE CITED
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

AKASHI, H., 1994  Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics 136:927-935[Abstract].

AKASHI, H., 1995  Inferring weak selection from patterns of polymorphism and divergence at "silent" sites in Drosophila DNA. Genetics 139:1067-1076[Abstract].

AKASHI, H., 1997  Distinguishing the effects of mutational biases and natural selection on DNA sequence variation. Genetics 147:1989-1991[Medline].

AKASHI, H. and A. EYRE-WALKER, 1998  Translational selection and molecular evolution. Curr. Opin. Genet. Dev. 8:688-693[Medline].

BIELAWSKI, J., K. A. DUNN, and Z. YANG, 2000  Rates of nucleotide substitution and mammalian nuclear gene evolution: approximate and maximum-likelihood methods lead to different conclusions. Genetics 156:1299-1308[Abstract/Free Full Text].

BULMER, M., K. H. WOLFE, and P. M. SHARP, 1991  Synonymous nucleotide substitution rates in mammalian genes: implications for the molecular clock and the relationship of mammalian orders. Proc. Natl. Acad. Sci. USA 88:5974-5978[Abstract/Free Full Text].

CARULLI, J. P., D. E. KRANE, D. L. HARTL, and H. OCHMAN, 1993  Compositional heterogeneity and patterns of molecular evolution in the Drosophila genome. Genetics 134:837-845[Abstract].

COMERON, J. M., 1995  A method for estimating the number of synonymous and nonsynonymous substitutions per site. J. Mol. Evol. 41:1152-1159[Medline].

COMERON, J. M. and M. AGUADÉ, 1996  Synonymous substitutions in the Xdh gene of Drosophila: heterogeneous distribution along the coding region. Genetics 144:1053-1062[Abstract].

COMERON, J. M. and M. KREITMAN, 1998  The correlation between synonymous and nonsynonymous substitutions in Drosophila: mutation, selection or relaxed constraints? Genetics 150:767-775[Abstract/Free Full Text].

COMERON, J. M., M. KREITMAN, and M. AGUADÉ, 1999  Natural selection on synonymous sites is correlated with gene length and recombination in Drosophila. Genetics 151:239-249[Abstract/Free Full Text].

DURET, L. and D. MOUCHIROUD, 1999  Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis.. Proc. Natl. Acad. Sci. USA 96:4482-4487[Abstract/Free Full Text].

FLYBASE,, 1999  The FlyBase database of the Drosophila genome projects and community literature. Nucleic Acids Res. 27:85-88. Available from http://flybase.bio.indiana.edu/([Abstract/Free Full Text].

GOLDMAN, N. and Z. YANG, 1994  A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11:725-736[Abstract].

GROSJEAN, H. and W. FIERS, 1982  Preferential codon usage in prokaryotic genes: the optimal codon-anticodon interaction energy and the selective codon usage in efficiently expressed genes. Gene 18:199-209[Medline].

HILL, W. G. and A. ROBERTSON, 1966  The effect of linkage on limits to artificial selection. Genet. Res. 8:269-294[Medline].

INA, Y., 1995  New methods for estimating the numbers of synonymous and nonsynonymous substitutions. J. Mol. Evol. 40:190-226[Medline].

KIRBY, D. A., S. V. MUSE, and W. STEPHAN, 1995  Maintenance of pre-mRNA secondary structure by epistatic selection. Proc. Natl. Acad. Sci. USA 92:9047-9051[Abstract/Free Full Text].

KLIMAN, R. M. and A. EYRE-WALKER, 1998  Patterns of base composition within the genes of Drosophila melanogaster.. J. Mol. Evol. 46:534-541[Medline].

KLIMAN, R. M. and J. HEY, 1994  The effects of mutation and natural selection on codon bias in the genes of Drosophila. Genetics 137:1049-1056[Abstract].

MCVEAN, G. A. T. and J. VIEIRA, 1999  The evolution of codon preferences in Drosophila: a maximum-likelihood approach to parameter estimation and hypothesis testing. J. Mol. Evol. 49:63-75[Medline].

MORIYAMA, E. N. and T. GOJOBORI, 1992  Rates of synonymous substitution and base composition of nuclear genes in Drosophila. Genetics 130:855-864[Abstract].

MORIYAMA, E. N. and D. L. HARTL, 1993  Codon usage bias and base composition of nuclear genes in Drosophila. Genetics 134:847-858[Abstract].

MORIYAMA, E. N. and J. R. POWELL, 1997a  Synonymous substitution rates in Drosophila: mitochondrial versus nuclear genes. J. Mol. Evol. 45:378-391[Medline].

MORIYAMA, E. N. and J. R. POWELL, 1997b  Codon usage bias and tRNA abundance in Drosophila. J. Mol. Evol. 45:514-523[Medline].

MOUCHIROUD, D., C. GAUTIER, and G. BERNARDI, 1995  Frequencies of synonymous substitutions in mammals are gene-specific and correlated with frequencies of nonsynonymous substitutions. J. Mol. Evol. 40:107-113[Medline].

MUSE, S. V., 1996  Estimating synonymous and nonsynonymous substitution rates. Mol. Biol. Evol. 13:105-114[Abstract].

NEI, M. and T. GOJOBORI, 1986  Simple methods for estimating the number of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418-426[Abstract].

POWELL, J. R. and E. N. MORIYAMA, 1997  Evolution of codon usage bias in Drosophila. Proc. Natl. Acad. Sci. USA 94:7784-7790[Abstract/Free Full Text].

RODRÍGUEZ-TRELLES, F., R. TARRÍO, and F. J. AYALA, 1999  Switch in codon bias and increased rates of amino acid substitution in the Drosophila saltans species group. Genetics 153:339-350[Abstract/Free Full Text].

SHARP, P. M. and W.-H. LI, 1987  The rate of synonymous substitution in enterobacterial genes is inversely related to codon usage bias. Mol. Biol. Evol. 4:222-230[Abstract].

SHARP, P. M. and W.-H. LI, 1989  On the rate of DNA sequence evolution in Drosophila. J. Mol. Evol. 28:398-402[Medline].

SHARP, P. M., T. M. F. TUOHY, and K. R. MOSURSKI, 1986  Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res. 14:5125-5139[Abstract/Free Full Text].

SHIELDS, D. C., P. M. SHARP, D. G. HIGGINS, and F. WRIGHT, 1988  "Silent" sites in Drosophila genes are not neutral: evidence of selection among synonymous codons. Mol. Biol. Evol. 5:704-716[Abstract].

WRIGHT, F., 1990  The `effective number of codons' used in a gene. Gene 87:23-29[Medline].

YANG, Z., 1999 Phylogenetic Analysis by Maximum Likelihood (PAML), Version 2. University College, London.

YANG, Z. and R. NIELSEN, 1998  Synonymous and nonsynonymous rate variation in nuclear genes of mammals. J. Mol. Evol. 46:409-418[Medline].

YANG, Z. and R. NIELSEN, 2000  Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol. Evol. 17:32-43[Abstract/Free Full Text].




This article has been cited by other articles:


Home page
GeneticsHome page
A. Llopart and J. M. Comeron
Recurrent Events of Positive Selection in Independent Drosophila Lineages at the Spermatogenesis Gene roughex
Genetics, June 1, 2008; 179(2): 1009 - 1020.
[Abstract] [Full Text] [PDF]


Home page
Am J Trop Med HygHome page
L. Urdaneta-Marquez, C. Bosio, F. Herrera, Y. Rubio-Palis, M. Salasek, and W. C. Black IV
Genetic Relationships among Aedes aegypti Collections in Venezuela as Determined by Mitochondrial DNA Variation and Nuclear Single Nucleotide Polymorphisms
Am J Trop Med Hyg, March 1, 2008; 78(3): 479 - 491.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
Z. Yang and R. Nielsen
Mutation-Selection Models of Codon Substitution and Their Use to Estimate Selective Strengths on Codon Usage
Mol. Biol. Evol., March 1, 2008; 25(3): 568 - 579.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
A. Heger and C. P. Ponting
Variable Strength of Translational Selection Among 12 Drosophila Species
Genetics, November 1, 2007; 177(3): 1337 - 1348.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
P. K. Ingvarsson
Gene Expression and Protein Length Influence Codon Usage and Rates of Sequence Evolution in Populus tremula
Mol. Biol. Evol., March 1, 2007; 24(3): 836 - 844.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
J. Charlesworth and A. Eyre-Walker
The Rate of Adaptive Evolution in Enteric Bacteria
Mol. Biol. Evol., July 1, 2006; 23(7): 1348 - 1356.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
K. Thornton, D. Bachtrog, and P. Andolfatto
X chromosomes and autosomes evolve at similar rates in Drosophila: No evidence for faster-X protein evolution
Genome Res., April 1, 2006; 16(4): 498 - 504.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
C. E. Popescu, T. Borza, J. P. Bielawski, and R. W. Lee
Evolutionary Rates and Expression Level in Chlamydomonas
Genetics, March 1, 2006; 172(3): 1567 - 1576.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
H. Akashi, W.-Y. Ko, S. Piao, A. John, P. Goel, C.-F. Lin, and A. P. Vitins
Molecular Evolution in the Drosophila melanogaster Species Subgroup: Frequent Parameter Fluctuations on the Timescale of Molecular Divergence
Genetics, March 1, 2006; 172(3): 1711 - 1726.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. C. Fay and J. A. Benavides
Hypervariable Noncoding Sequences in Saccharomyces cerevisiae
Genetics, August 1, 2005; 170(4): 1575 - 1587.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
L. M. Matzkin, T. J. S. Merritt, C.-T. Zhu, and W. F. Eanes
The Structure and Population Genetics of the Breakpoints Associated With the Cosmopolitan Chromosomal Inversion In(3R)Payne in Drosophila melanogaster
Genetics, July 1, 2005; 170(3): 1143 - 1152.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
A. E. Hirsh, H. B. Fraser, and D. P. Wall
Adjusting for Selection on Synonymous Sites in Estimates of Evolutionary Distance
Mol. Biol. Evol., January 1, 2005; 22(1): 174 - 177.
[Abstract] [Full Text] [PDF]