The evolution of codon bias, the unequal usage of synonymous codons, is thought to be due to natural selection for the use of preferred codons that match the most abundant species of isoaccepting tRNA, resulting in increased translational efficiency and accuracy. We examined this hypothesis by introducing 1, 6, and 10 unpreferred codons into the Drosophila alcohol dehydrogenase gene (Adh). We observed a significant decrease in ADH protein production with number of unpreferred codons, confirming the importance of natural selection as a mechanism leading to codon bias. We then used this empirical relationship to estimate the selection coefficient (s) against unpreferred synonymous mutations and found the value (s ≥ 10-5) to be approximately one order of magnitude greater than previous estimates from population genetics theory. The observed differences in protein production appear to be too large to be consistent with current estimates of the strength of selection on synonymous sites in D. melanogaster.
DESPITE the redundancy of the genetic code, synonymous codons are not used with equal frequency, a phenomenon known as codon bias (Ikemura 1981). Codon bias is most extreme in highly expressed genes (Gouy and Gautier 1982; Sharp and Li 1986; Duret and Mouchiroud 1999), and natural selection favors the use of preferred codons, which match the most abundant species of isoaccepting tRNA (Ikemura 1981, 1982; Grosjean and Fiers 1982; Moriyama and Powell 1997). This results in increased translational efficiency and accuracy and decreased proofreading costs (Bulmer 1991). However, in multicellular organisms the multitude of tissue types and developmental stages makes it difficult to generalize which species of tRNA is most abundant. The relationship between codon bias and level of gene expression has been experimentally confirmed in Escherichia coli (Sörensenet al. 1989; Andersson and Kurland 1990), and the in vitro expression efficiency of heterologous genes in cultured eukaryotic cells has been shown to be significantly increased by the use of preferred codons of the host cell (Zolotukhinet al. 1996; Kimet al. 1997). However, the importance of codon bias in enhancing mRNA translation rates and fidelity has yet to be empirically demonstrated in vivo for multicellular organisms.
If codon bias is the result of natural selection, a change from a preferred to an unpreferred codon should lead to reduced protein expression levels, caused by a decrease in the efficiency or fidelity of translation or some combination of both (Bulmer 1991). In either case the final phenotype would be affected, but it would be difficult to discriminate between the two mechanisms solely on the basis of measurements of protein activity. A third possibility is that codon bias decreases proofreading costs by reducing the time and energy required to reject noncognate tRNAs (Bulmer 1991). Introduction of unpreferred codons would increase proofreading costs and would also be predicted to result in a net decrease in the protein levels.
Under natural selection, the fate of any given mutation depends on the product of the effective population size and selection coefficient, Nes (Kimura 1983). Codon bias results from the dual action of directional selection for preferred codons (Nes > 0) and purifying selection against unpreferred codons (Nes < 0). Since selection for codon bias is thought to be relatively weak (e.g., in comparison with adaptive substitutions at the amino acid level), the selection-mutation-drift (SMD) model of codon bias predicts that unpreferred codons will persist as a consequence of mutation pressure and genetic drift (Li 1987; Bulmer 1991). Current population genetics theory predicts that Nes for any codon change in Drosophila melanogaster is not significantly different from 0 (Akashi 1995, 1996; McVean and Vieira 2001). In contrast, for D. simulans the estimate is 1.3 < |Nes| < 3.6 (Akashi 1995). Although D. melanogaster shows less nucleotide diversity than D. simulans, the threefold difference in Ne (Powell 1997) is not large enough to account for the difference in Nes between the two species. This suggests a decrease of |s| in the recent past (Akashi 1995, 1996; McVean and Vieira 2001).
Thus, the analysis of patterns of molecular evolution using population genetics theory suggests that the fitness effect of an individual synonymous mutation from a preferred codon to an unpreferred codon is likely to be very small, perhaps immeasurable in the lab. However, although Nes for any codon change in D. melanogaster is not statistically different from 0 (Akashi 1995, 1996; McVean and Vieira 2001), each codon family is likely to have a unique selection coefficient. Furthermore, each class of synonymous mutation within a codon family is probably unique. Since we wanted to determine if it was possible to measure the effects of manipulating codon bias, the leucine codons of the alcohol dehydrogenase gene (Adh) appeared to be the most promising targets for experimentation for two reasons. First, different codon families exhibit different degrees of codon bias in D. melanogaster (McVean and Vieira 2001). By several measures, the leucine codon family is one of the most highly biased in the D. melanogaster genome (Li 1987; Moriyama and Powell 1997; McVean and Vieira 2001). Second, Adh is a highly expressed gene with a high level of codon bias. Adh is among the top 2% most highly biased genes in the D. melanogaster genome (see Duret and Mouchiroud 1999, online material). Accordingly, we expected unpreferred changes in the leucine codon family would be the most likely to result in measurable differences in ADH expression following the experimental introduction of unpreferred codons.
MATERIALS AND METHODS
Experimental procedures: Adh constructs were derived from an 8.6-kb SacI-ClaI fragment of the Wa-F allele (Kreitman 1983). Mutagenesis was performed on a pUC18 plasmid containing the 8.6-kb fragment using the Quick-change mutagenesis kit (Stratagene, La Jolla, CA). A single nucleotide substitution was made at codon 16 (CTG to CTA) to create the 1 Leu mutant construct. For the 6 Leu mutant construct, nucleotide substitutions were made at codons 5 (TTG to CTA), 16 (CTG to CTA), 21 (CTG to CTA), 27 (CTG to CTA), 28 (CTC to CTA), and 32 (CTG to CTA). With the exception of codon 5, the 10 Leu mutant construct contained the same substitutions as the 6 Leu construct, with an additional five substitutions at codons 35 (CTG to CTA), 38 (CTC to CTA), 50 (CTG to CTA), 76 (CTG to CTA), and 77 (CTG to CTA). Mutant clones were sequenced to ensure that the desired mutation(s) were present before proceeding. The 8.6-kb SacI-ClaI fragment was subcloned into a ClaI site added to the YES transformation vector (Parschet al. 1997). The YES vector is a P-element vector containing the D. melanogaster yellow gene as a selectable marker (Pattonet al. 1992).
Germline transformation was performed by microinjection of y w; Adhfn6; Δ2-3, Sb/TM6 embryos. A splicing defect in the Adhfn6 allele results in no detectable ADH protein (Benyajatiet al. 1982). The source of transposase used was from the Δ2-3 P insertion on the third chromosome (Robertsonet al. 1988). Injected survivors were crossed to a y w; Adhfn6 stock and transformants were identified by body color. Mobilization crosses were performed to generate additional lines with inserts at unique chromosomal locations. Transformant lines containing insertions on the X chromosome were crossed to the y w; Adhfn6; Δ2-3, Sb/TM6 stock. y+; Sb offspring (containing both the YES insertion and the source of transposase) were then crossed to the y w; Adhfn6 stock. Flies containing mobilized insertions were identified as y+ offspring where the y+ marker was not segregating with the same chromosome as the parental insert.
Lines containing single insertions were identified through Southern blotting using an Adh-specific probe spanning ∼1.5 kb of the Adh 5′ flanking sequence (Parschet al. 1997). Insert DNA from two to three independent lines within each genotype was PCR amplified and sequenced to verify the correct haplotype with respect to the respective mutations.
Transformed males were crossed to the y w; Adhfn6 stock to produce y+ offspring heterozygous for the Adh insertion. Two crosses were performed for each line. For each cross, five males and five females were mated, and five male progeny were collected at age 6-8 days and used for preparation of crude protein extracts, which were used in the ADH assays. A standard protocol was used for performing ADH assays (Maroni 1978) using isopropanol as the substrate. Total protein content of the crude extracts was determined through the Lowry method (Lowryet al. 1951). ADH activity was measured as micromole of NAD+ reduced per minute per milligram of total protein. The entire procedure (ADH activity and protein content) was repeated at two different time blocks, representing a total of four measurements per line (= two crosses per line × two measurements per cross). A nested ANOVA was used to test the null hypothesis of no differences in ADH activity between genotypes. Post hoc tests were performed to test for significant differences in pairwise comparisons.
Data analyses: We used two population genetic methods to obtain rough estimates of the fitness effects of our mutations. First, we applied the saturation theory of molecular evolution (Hartlet al. 1985) to our empirical data on the relationship between the number of unpreferred mutations and corresponding reduction in ADH activity. The saturation theory of molecular evolution explores the relationship between enzymatic activity and fitness. Using saturation theory, Hartl et al. (1985) derived the relation between ADH activity and fitness from the frequency of null Adh alleles in natural populations (Langleyet al. 1981). From this they estimated the standardized amount or activity (a0 = 538.50) of the Adh gene product in natural populations. We used this estimate of a0 to obtain the value of s from our data. First, we performed a linear regression on percentage of activity (relative to the control mean) vs. number of unpreferred mutations. The relative activity was calculated as a percentage of the average activity (micromole NAD+ reduced per minute per milligram protein × 100) among control (Wa-F transformant) lines. The slope of the linear regression (y =-2.13x + 95.87, R2 = 0.23) indicated a significant reduction in ADH activity with number of unpreferred mutations (P < 0.001). Higher-order regressions did not improve the fit to the data. We observed a 2.13% decrease in activity per unpreferred mutation, a value that we then used to calculate a1, which is simply equal to a0 - (2.13% × a0) (Hartlet al. 1985). Next, we obtained f(a1), the fitness of an individual with a single unpreferred mutation using the relation f(a) = a/(1 + a). Finally, the selection coefficient is given by s = 1 - f(a0)/f(a1), which yielded an estimate of |s| = 4.0 × 10-5. The value of the standardized ADH activity, a0 = 538.5 (Hartlet al. 1985), is dependent on the frequencies of the Adh-Fast and Adh-Slow alleles in the populations surveyed (Langleyet al. 1981). If their estimate of a0 relates to some average of the two variants in the population, then the value of a0 based on Adh-Fast alone would be >538.5 (by a factor of two at the most), resulting in a smaller selection coefficient. Nevertheless, doubling the value of a0 would halve the value of |s|, but |s| would still be over an order of magnitude >10-6.
Second, following Bulmer (1991), we used the SMD model to obtain a crude estimate of the fitness effects of our introduced CTA mutations. The diffusion approximation of the SMD model with genic selection can be extended from a two- or fourfold degenerate codon family to a family with six codons under the assumption that the mutation rate between all codons is equal. According to equation 4 of Li (1987), the expected frequency of codon i within a family is then approximately proportional to exp(4Nesi), where Nesi < 0 is the selection intensity against codon i. In Adh of D. melanogaster (and in all species of the melanogaster subgroup), the observed frequency of the CTA codon is 0 (Nakamuraet al. 2000). This may suggest that Nesi <-1 (or even Nesi ⪡ -1). For unequal mutation rates (more appropriate for the Leu codon family), a solution of the diffusion equation of the SMD model is not available (Ewens 1979). However, assuming that the unpreferred codon CTA is much stronger selected against than the suboptimal codons TTG and CTC of the Leu family, a timescale argument suggests a similar result as in the case of equal mutation rates.
Folding free energies of the 1 Leu, 6 Leu, and 10 Leu mature mRNA sequences were calculated on the mFOLD server (Mathewset al. 1999). Phylogenetically conserved pairing regions were identified using the PIRANAH software program (Parschet al. 2000).
RESULTS AND DISCUSSION
Three classes of mutant genotypes were constructed using P-element-mediated germline transformation. We introduced 1 (1 Leu), 6 (6 Leu), or 10 (10 Leu) mutations from preferred leucine codons (CTG or CTC; Akashi 1995) to unpreferred leucine codons (CTA) in the Adh transgene and compared the level of ADH activity in these lines to transformant lines containing the unaltered native transgene (control, Wa-F allele). Since the amino acid sequences of all four genotypes were identical and the only differences among the genotypes were in synonymous mutations in coding regions, any differences in ADH activity could be attributed to differences in the expression of the transgene (in an otherwise Adh-null background of Adhfn6, splicing defect).
The introduction of unpreferred codons resulted in a measurable decrease in ADH activity. The average ADH activities of the Wa-F controls and 1 Leu, 6 Leu, and 10 Leu lines were 98.8, 88.9, 80.3, and 75.0, respectively (Table 1). Differences in ADH activity among the four genotypes were highly significant (P < 0.01, Table 2). The mean ADH activity of the Wa-F control lines was significantly greater than that of both the 6 Leu (P < 0.05) and 10 Leu (P < 0.01) lines (Table 3).
The prediction of population genetics theory that Nes for any codon change in D. melanogaster is not significantly different from 0 (Akashi 1996; McVean and Vieira 2001) is difficult to reconcile with our data, which demonstrate that the effects of unpreferred synonymous substitutions are experimentally measurable. Indeed, a population genetics model that relates enzyme flux to fitness in a simple linear fashion (Hartlet al. 1985) indicates that a value of |s| on the order of 10-5 would be consistent with our observations (see materials and methods). Assuming the standard estimate of Ne = 106 for D. melanogaster inferred from levels of neutral variation (Powell 1997), this value of s would then result in a very large estimate of |Nes|. Finally, estimation based on an extension of the SMD model from twofold degenerate to sixfold degenerate codons may also suggest a value of |Nes| > 1 for our changes to the unpreferred Leu codon CTA. This is based on the observation that the frequency of CTA codons in all D. melanogaster Adh alleles sequenced to date is 0 (Nakamuraet al. 2000). In fact, the frequency of CTA codons in Adh in all species of the melanogaster subgroup is 0 (Nakamuraet al. 2000).
The discrepancy between these estimates may arise from several sources. On the one hand, the underlying population genetic models rest on various assumptions that may not be entirely appropriate for D. melanogaster. The estimate of s based on metabolic theory assumes that all synonymous codons are under the same selective pressure and equally likely to be polymorphic. This assumption may be violated, as recent theoretical work has demonstrated considerable variation in s among the different synonymous groups (McVean and Vieira 2001). The estimates based on the other models may suffer from the observation that the equilibrium assumption of codon bias appears to be violated in D. melanogaster (Akashi 1995, 1996).
On the other hand, Adh-specific effects may also play a role. The 6 Leu and 10 Leu line constructs contained one and two sets, respectively, of consecutive unpreferred codons. In highly expressed genes of bacteria, the tandem arrangement of rare codons has been shown to sequester cognate tRNAs in the P site, causing the translation of these codons to be rate limiting (Varenneet al. 1989; Ivanovet al. 1997). Furthermore, unpreferred codons were all introduced in the 5′ region of the gene, where their effects on translation may be more pronounced if translation initiation is rate limiting. However, in prokaryotes codon bias is less extreme at the 5′ end of genes, possibly facilitating ribosome binding (Eyre-Walker and Bulmer 1993).
It is also possible that the reduction in ADH protein production may not be due to codon bias alone. Perhaps the introduced substitutions altered the secondary structure of the Adh mRNA transcript, and the mutant transcripts were more difficult to translate due to interference from secondary structures. To address this possibility, we compared the folding free energies of the Wa-F, 1 Leu, 6 Leu, and 10 Leu transcripts using mFOLD (Mathewset al. 1999). We observed no appreciable differences in free energies, indicating that global secondary structure was not significantly altered by the introduced mutations. We also tested for the alteration of individual structural elements (i.e., hairpins) using a maximum-likelihood-based phylogenetic comparative approach to predicting mRNA secondary structures (Parschet al. 2000). None of the sites targeted for mutation were predicted to be involved in strongly conserved structures. Previous analyses also indicated that the coding sequences of the Adh gene are unlikely to contain strongly conserved individual structural elements (Carliniet al. 2001). Therefore, the predicted changes in secondary structure are minor and unlikely to be the major factor accounting for the relatively large changes in protein activity we observed. We conclude that the observed differences in protein activity are likely due to effects at the level of translation. The introduction of unpreferred codons decreased the rate and accuracy of translation and/or increased proofreading costs (Bulmer 1991).
In summary, our results are important for at least two reasons. First, if the population genetic estimates of Nes are indeed as small as currently thought, our observations show that the consequences of very small selective differences can be observed. This will encourage more experimental work on fitness-related traits in eukaryotes, which thus far has not been undertaken because the effects of small fitness differences were thought to be immeasurable. Even granting that the actual fitness differences are immeasurable in the lab, our findings indicate that the effects on the phenotype may be substantial (e.g., each unpreferred codon resulted in an ∼2.13% drop in activity) and may be worthy of further investigation. However, we point out that we deliberately selected the most biased codon family and introduced a strongly unpreferred codon (CTA) in place of preferred codons, so that average selection coefficients are likely to be much smaller. Second, should the selection intensity on synonymous positions be larger than currently believed, our observations are expected to stimulate more work on codon bias evolution and the theory of weak selection in general. Several avenues of future research include replacing unpreferred codons with preferred codons, examining other codon families, or measuring the level of expression of other highly expressed genes in different genetic backgrounds (e.g., wild-type Adh vs. 10 Leu Adh) to examine the effects of ribosome competition. These studies would complement work previously conducted in prokaryotes (Sörensenet al. 1989; Andersson and Kurland 1990) and would address the generality of the results of this study to eukaryotic systems.
We are grateful to J. Baines, Y. Chen, and J. Parsch for assistance in the lab and to H. Akashi, A. Eyre-Walker, D. Hartl, G. McVean, J. Parsch, S. Schaeffer, and two anonymous reviewers for helpful comments on the manuscript. This study was supported by National Institutes of Health grant GM-58404 to W.S. and by funds from the American University to D.B.C. and from the University of Munich to W.S.
Communicating editor: S. Schaeffer
- Received August 24, 2002.
- Accepted October 28, 2002.
- Copyright © 2003 by the Genetics Society of America