- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Carlini, D. B.
- Articles by Stephan, W.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Carlini, D. B.
- Articles by Stephan, W.
The Relationship Between Third-Codon Position Nucleotide Content, Codon Bias, mRNA Secondary Structure and Gene Expression in the Drosophilid Alcohol Dehydrogenase Genes Adh and Adhr
David B. Carlinia, Ying Chena, and Wolfgang Stephanba Department of Biology, University of Rochester, Rochester, New York 14627
b Department of Evolutionary Biology, University of Munich, 80333 Munich, Germany
Corresponding author: Wolfgang Stephan, Department of Evolutionary Biology, University of Munich, 80333 Munich, Germany., stephan{at}zi.biologie.uni-muenchen.de (E-mail)
| ABSTRACT |
|---|
To gain insights into the relationship between codon bias, mRNA secondary structure, third-codon position nucleotide distribution, and gene expression, we predicted secondary structures in two related drosophilid genes, Adh and Adhr, which differ in degree of codon bias and level of gene expression. Individual structural elements (helices) were inferred using the comparative method. For each gene, four types of randomization simulations were performed to maintain/remove codon bias and/or to maintain or alter third-codon position nucleotide composition (N3). In the weakly expressed, weakly biased gene Adhr, the potential for secondary structure formation was found to be much stronger than in the highly expressed, highly biased gene Adh. This is consistent with the observation of approximately equal G and C percentages in Adhr (
31% across species), whereas in Adh the N3 distribution is shifted toward C (42% across species). Perturbing the N3 distribution to approximately equal amounts of A, G, C, and T increases the potential for secondary structure formation in Adh, but decreases it in Adhr. On the other hand, simulations that reduce codon bias without changing N3 content indicate that codon bias per se has only a weak effect on the formation of secondary structures. These results suggest that, for these two drosophilid genes, secondary structure is a relatively independent, negative regulator of gene expression. Whereas the degree of codon bias is positively correlated with level of gene expression, strong individual secondary structural elements may be selected for to retard mRNA translation and to decrease gene expression.
THE highly conserved secondary structures of rRNAs (![]()
![]()
![]()
![]()
Several hypotheses have been advanced to account for the positive correlation between the degree of codon bias and level of gene expression. It is thought that this relationship reflects selection for the use of codons specifying abundant tRNA molecules (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
If mRNA secondary structure and codon bias are interrelated, then it should be possible to ascertain the relationship between the two factors. In other words, are mRNAs from highly biased genes more (or less) stable than mRNAs from unbiased genes? While the relationship may appear to be relatively straightforward to test, ascertaining the degree of mRNA stability is complicated. Unlike the shorter tRNAs, which can be crystallized and subject to X-ray diffraction (![]()
Algorithms based on free energy minimization (e.g., ![]()
![]()
![]()
![]()
![]()
![]()
![]()
In this study we used the method of ![]()
- The two genes exhibit a substantial difference in the extent of codon bias and level of gene expression.

View larger version (14K):
In this window
In a new window
Download PPT slide
Figure 1. Hypothesized interactions between mRNA secondary structure, codon bias, N3 content, and gene expression. Solid arrows represent interactions examined in this study. - The method of
PARSCH et al. 2000 requires a set of aligned homologous sequences and a "known" phylogeny of the aligned sequences. The method is computer intensive and therefore candidate genes must not be too long (<1 kb).
- The genes are similar in length and have similar base compositions.
- The two genes are very tightly linked (<1 kb) and thus presumably undergo similar rates of recombination (in the species from which Adhr is sampled in this study). In Drosophila, this was the only gene pair to meet these criteria.
Our specific aims are: (1) to examine whether secondary structure has an effect on gene expression, (2) to test whether codon bias per se influences secondary structure formation, and (3) to determine whether N3 affects secondary structure. The interactions we investigate in this study are shown as solid-line arrows in Fig 1. For each gene, we performed four types of randomization simulations to maintain or reduce codon bias and/or to maintain or alter third-codon position nucleotide content. The potential of forming individual secondary structural elements (helices) is measured in terms of LRT scores. The LRT distributions predicted from the different sets of 100 randomization simulations were compared to each other and to the structures predicted from analyses of the native sequences to address the above three questions. For several reasons (in particular, to avoid problems associated with the alignment of the sequences), we focus here on LRT score distributions of individual pairing regions occurring in exons only.
| MATERIALS AND METHODS |
|---|
Sequences and alignments:
Adh sequences from seven drosophilid species were downloaded from GenBank (Table 1) and aligned by eye. Relative to the D. melanogaster Adh sequence (Wa-S allele), a single 6-bp gap was introduced at positions 712 in all other sequences, with the exception of Zaprionus, where a single 3-bp gap was introduced at positions 1012. The Adh alignment comprised 771 positions (including gaps). Adhr sequences from six drosophilid species were downloaded from GenBank (Table 1) and aligned using the CLUSTAL algorithm as implemented in GeneJockey II (![]()
|
Secondary structure prediction:
Putative pairing regions were identified using the PIRANAH software program described in ![]()
![]()
Randomization simulations:
Each native DNA sequence alignment of the several taxa was randomly shuffled 100 times to generate a distribution of randomized sequence alignments to which the native sequence alignment can be compared. Since we used sequence alignments instead of individual sequences, and coding regions instead of noncoding regions, two criteria had to be met to draw meaningful comparisons with the native alignments. First, levels of sequence divergence among the individual sequences in the sequence alignment were to remain unchanged. Second, the encoded amino acid sequence could not be altered.
This was accomplished by randomizing the DNA sequence alignment column by column. Each column was composed of one codon (3 nucleotides) at the same position of every sequence in the alignment. A randomized codon table corresponding to the original codon table was generated, and the codons in each column of the sequence alignment were substituted with the respective codons in the new codon table. Since the same codon conversion table was applied to all codons within one column, sequences that had the same codon in the native alignment would still have the same codon after the randomization, even though now the codon they shared could be different from the one in the native alignment. In this way, the phylogenetic relationships of the native sequence alignment were maintained in the randomized alignments as much as possible. The codon conversion table was generated by shuffling the order of codons within each codon family, so that the encoded amino acid was maintained after the randomization. For "native bias" randomizations, the codon conversion table was generated once at the beginning and used throughout the alignment. For "reduced bias" randomizations, a codon conversion table was generated independently for each column of the alignment before the column was randomized. So a preferred codon may be changed to one codon in one column and to a different codon in another column, thus reducing the codon bias for the entire sequence length.
No shuffling was conducted on Met and Trp codons, as there is no codon degeneracy for these two amino acids. Stop codons also remained unaltered in the randomized sequence alignments. For twofold degenerate codons, such as Phe, a codon within the Phe codon family (UUU or UUC) was randomly drawn in the "equal N3" randomizations (see below) to correspond to the UUU codon in the new randomized table. In this way, the original UUU in the native sequence alignment had a 50% probability of remaining unaltered and a 50% probability of changing to UUC. In the "native N3" randomizations, codons were drawn from the weighted probabilities on the basis of the observed frequencies of UUU and UUC in the native alignment. Threefold and fourfold degenerate codon families were shuffled in the same manner as the twofold degenerate codon families. The three sixfold degenerate codon families were each split into two separate codon families of twofold degeneracy and fourfold degeneracy. For example, the Arg codon family was broken into a twofold degenerate codon subfamily of AGR and a fourfold degenerate subfamily of CGN. The two codon subfamilies were not interchangeable: an AGA in the original sequence alignment was never changed to any of the CGN codons, and vice versa, despite the fact that they all translated to Arg. The main reason for doing so was to maintain the level of conservation among sequences in the sequence alignment, since random shuffling among the sixfold degenerate codon family as a whole could incur two simultaneous nucleotide substitutions: one at the first codon position and one at the third position. This would result in an increase in the divergence among simulated sequences relative to the original sequences. Since the level of sequence divergence affects the LRT score of predicted helices, the LRT distributions of helices predicted in randomized alignments would not be strictly comparable to those obtained from analysis of the original alignments.
Due to the method of splitting sixfold degenerate codon families into two noninterchangeable subfamilies, four classes of synonymous substitutions were not permitted in the randomizations (Leu: TTA
CTA, TTG
CTG; Arg: AGA
CGA, AGG
CGG). These four classes of substitutions were relatively rare in the original Adh and Adhr alignments: only 16 of 256 Adh codons and 16 of 287 Adhr codons contained these types of substitutions. In each of the 16 cases for both genes, the substitution was usually restricted to one or two of the sequences, so that the incidence of such substitutions was actually much less than simply 16/256 or 16/287. This relative rarity of "noninterchangeable substitutions" in the original alignments justifies our approach of breaking up sixfold degenerate codon families into two separate codon families.
For each gene, the following four types of randomizations were carried out on the native sequence alignment to generate 100 randomized sequence alignments each.
Native bias, equal N3: In this randomization method, the native codon bias was maintained in the randomized alignments, while the N3 was changed to approximately 25% G, 25% C, 25% A, and 25% U. A randomized codon table corresponding to the original codon table was generated once at the beginning of the randomization and was used throughout the sequence alignment until the randomization was complete. In this manner, the ranking of favored codons in each codon family may have been altered, but the relative proportions of each codon usage in each codon family remained the same; i.e., the codon bias of each randomized sequence alignment was identical to that of the native sequence alignment. In each codon family any codon, regardless of its third-position nucleotide, was equally likely to be chosen as the most favored codon. As a result, the overall N3, averaging the effect of random shuffling of 21 codon families, approached 25% per nucleotide in these randomizations (for details see http://troi.cc.rochester.edu/~ying/appendix1.html).
Reduced bias, equal N3: In contrast to the native bias, equal N3 case, the randomized codon table corresponding to the original codon table was generated for each column of codons in the native sequence alignment. Thus, by averaging over 256 (Adh) or 287 (Adhr) columns of codons, the bias of codon usage in one randomization was reduced. The codon usage bias could not be totally eliminated due to the requirement of maintaining the phylogenetic relationships among the DNA sequences in the alignment. The N3 content approaches 25% each due to the same reason given in the native bias, equal N3 randomization method (see http://troi.cc.rochester.edu/~ying/appendix1.html).
Native bias, native N3: The base composition of all sequences in the native sequence alignment was calculated for each codon family, and the frequencies of nucleotide content were used as weights in generating the randomized codon table corresponding to the original codon table. A codon in the original sequence alignment was more likely to be changed to a G or C ending codon than an A or U ending codon of the same codon family, if the GC3 content of that particular codon family was >50%. For example, in the original Adh alignment, there was an average of 75.4% of C and 24.6% of U at the third-codon position for the Phe codon family. In this case the more favored codon, UUC, was slightly more than three times as likely to remain unchanged than to change to UUU after the randomization. The N3 contents of each randomization were not identical to those in the native sequence alignment, but remained quite close (see http://troi.cc.rochester.edu/~ying/appendix1.html). The randomized codon table was generated once at the beginning and used throughout the sequence alignment of one randomization to maintain the native codon bias.
Dinucleotide content has been shown to have a significant influence on the potential for secondary structure formation in mRNAs (![]()
2 = 3.116, NS; Adhr:
2 = 2.614, NS).
Reduced bias, native N3: This randomization method was similar to the native bias, native N3 method in maintaining the native N3 content. However, the randomized codon table corresponding to the original codon table was generated for each column of codons in the sequence alignment to reduce codon bias, as explained in the reduced bias, equal N3 method. The randomization programs' source codes (written in C) are available at http://troi.cc.rochester.edu/~ying/randomization.html.
Analysis of LRT score distributions:
PIRANAH generates a list of helices in the sequence alignment that satisfy the criteria specified by the user (see Secondary Structure Prediction). For each helix, the LRT score and position of paired nucleotides are provided. The effects of altering codon bias and/or third-codon base composition were assessed in several ways. First, for each gene in each of the four sets of randomizations, we compiled the average proportion of helices with LRT scores within bins of five LRT units. The proportion of helices in each bin could then be compared across randomizations. The average scores among the 100 replicate randomizations of the best helix (highest LRT) were calculated for each gene in each of the four randomizations. We also calculated the average 5% cutoff of the best helices for each of the randomizations. The total number of helices in each randomization was multiplied by 5% to obtain the rank of the helix (and LRT score) representing the 5% cutoff. We averaged these LRT scores to obtain the average 5% cutoff for each set of randomizations.
| RESULTS |
|---|
Distribution of LRT scores of all predicted structures:
The distribution of LRT scores of individual structures predicted from analysis of the native Adh sequences and the four sets of 100 randomization simulations is illustrated in Fig 2. Scores of predicted helices are grouped into bins of five LRT units on the abscissa. For the native Adh sequences, the proportion of 234 predicted helices with LRT scores in a given range are plotted. For the randomizations, the average proportions (among the 100 simulated data sets) of helices in a given range are plotted. Error bars represent ±1 standard deviation (SD) of the mean within each bin. For all randomizations and for the native sequences, the majority of helices fulfilling the criteria specified in PIRANAH (e.g., degree of conservation, minimum helix length) were in the intermediate range (515) of LRT scores. Both of the equal N3 randomizations contained a higher proportion of helices with high LRT scores (
20) than did the native sequences or either of the native N3 randomizations. For high LRT scores (
20), results from analysis of native sequences did not differ appreciably from results from native N3 randomizations, irrespective of level of codon bias.
|
The distribution of LRT scores of individual structures predicted from analysis of the native Adhr sequences and the four Adhr randomization simulations is illustrated in Fig 3. Most helices meeting the criteria specified in PIRANAH were in the intermediate range (515) of LRT scores for all randomizations and for the native sequences. In contrast to results from analysis of Adh, the native Adhr sequences contained a higher proportion of helices with high LRT scores (
20) than all four randomization simulations. Also in contrast to Adh, the equal N3 randomizations contained a lower proportion of helices with high LRT scores than did native N3 randomizations, irrespective of level of codon bias. Differences between the equal N3 and native N3 randomization simulations were slighter than those observed for Adh, due to the smaller difference between native N3 content and equal N3 content in Adhr (Table 1).
|
These results suggest that a reduction of codon bias per se (without changing N3 content) has a relatively weak effect on the pairing potential of the best stems (LRT
20) in both Adh and Adhr. In contrast, N3 content has a significant effect on pairing. Perturbing the N3 distribution to approximately equal amounts of A, G, C, and T increases the potential for secondary structure formation in Adh, but decreases it in Adhr. This finding, together with the result that the equal N3 simulations produce more helices with LRT
20 than the native sequences for Adh, but not for Adhr, suggests that the potential for secondary structure formation is much stronger in Adhr than in Adh.
These observations are based on the distribution of helices with LRT
20. Except for the fact that this is a relatively high value for LRT cutoff scores for secondary structures (![]()
Maximum and 5% cutoff LRT scores:
The average (±SD) of the maximum LRT and 5% cutoff LRT from each of the four Adh randomizations are presented in Table 2. A two-way ANOVA was conducted to test for the effects of bias and N3 content on maximum LRT score (Table 3). The effect of altering codon bias was not significant for maximum LRT score (P = 0.567), but reduced bias randomizations had significantly greater 5% cutoff LRTs (P = 0.001). The effect of altering N3 content was highly significant for both maximum and 5% cutoff LRTs (P < 0.01): the equal N3 simulations had higher average maximum and 5% cutoff LRT scores than native N3 simulations. The interaction (codon bias * N3 content) was significant for maximum LRT scores (P = 0.03), due to the fact that bias could not be effectively removed while maintaining N3 content at native levels. The average level of bias in reduced bias, native N3 simulations was significantly greater (average ENC = 45.5 ± 1.0) than the level of bias in reduced bias, equal N3 simulations (average ENC = 54.6 ± 1.2; t = -58.9, P < 0.001). The maximum LRT score of helices predicted from analysis of the native Adh sequences was 26.52. In comparison, 43 of the 100 native bias, native N3 randomized data sets had helices with higher LRT scores.
|
|
Results from comparisons of average (±SD) LRT scores of top helices from each of the Adhr simulations are also presented in Table 2. As with Adh, altering level of codon bias had no significant effect on average maximum LRT scores (Table 3), nor did it significantly affect the average 5% cutoff LRT. Altering N3 content resulted in a highly significant effect; the average maximum LRT score from the native N3 simulations was higher than the average from the equal N3 randomizations (P < 0.01). Average 5% cutoff LRTs were also significantly greater for the native N3 randomizations (P < 0.01). The interaction variance was not significant for either maximum LRT or 5% LRT. The top LRT score of the helices predicted from analysis of the native Adhr sequences was 28.72. Of the 100 native bias, native N3 simulations, 49 had structures with higher LRT scores.
In general, the results from analysis of the LRT score distributions presented above and the results from maximum and 5% cutoff LRTs presented in this section are in agreement. That is, altering N3 content exerted the greatest effect on both the LRT score distributions and on the maximum and 5% cutoff LRT scores. One exception is that altering codon bias did result in significant differences among the Adh 5% cutoff LRTs, whereas there was no effect on maximum LRTs or on the LRT distributions. This apparent inconsistency is due in part to the reduced variance in upper 5% LRT scores (SD
1) compared with maximum LRT scores (SD
2), such that the critical difference for statistical significance was lower for the 5% cutoff comparisons. To make sure that this pattern held for a range of different cutoff LRTs around the 5% critical value, we also calculated 3 and 10% cutoff LRTs for the Adh simulations. For both the 3 and 10% critical LRTs, the effect of bias remained significant (P < 0.01), as did N3 content (P < 0.0001), whereas the interaction (bias * N3) remained insignificant (P > 0.2). In other words, we found the same pattern for the 3 and 10% critical values as was found for the 5% critical values. Together, these results suggest that we would obtain the same result no matter what cutoff value we use, unless the cutoff values chosen are too low (in which case the results would be very similar to the maximum LRT results) or too high (for which no differences would be observed).
Reading frame pairings:
There are three possible pairing orientations when considering the codon positions of nucleotides on opposite strands of a helix. Third-codon position on one strand can pair with either first- (3-1), second- (3-2), or third- (3-3) codon position nucleotides on the opposite strand. The number and average LRT scores of best helices in the 3-1, 3-2, or 3-3 complementary reading frame orientations are listed in Table 2. For both Adh and Adhr, most of the top helices are in the 3-3 orientation, with Adhr exhibiting an even greater preponderance of 3-3 helices. The 3-2 pairing frame is the least common orientation, in particular for Adhr. The average LRT scores of the three possible orientations do not differ appreciably, with the exception of the Adh reduced bias, equal N3 randomizations, where the average maximum LRT of 3-3 frame helices was significantly greater than that of 3-2 frame helices (Fisher's post hoc pairwise test: P = 0.01). For the native sequences, the 3-3 pairing was the most common in both the Adh and Adhr genes. The maximum LRT helix in both alignments was in the 3-3 frame. Overall, 39% of the 234 helices predicted from analysis of the native Adh sequences were in the 3-1 frame, 9% were in the 3-2 frame, and 51% were in the 3-3 frame. For the 437 helices predicted from analysis of the native Adhr sequences, the corresponding proportions were 40, 14, and 46%. The relative proportions of 3-1, 3-2, and 3-3 frame helices were more skewed for higher LRT helices. For Adh, of the 57 helices with LRTs
15, the proportions of 3-1, 3-2, and 3-3 frame helices were 23, 5, and 72%, respectively. For Adhr, of the 123 predicted helices with LRTs
15, the proportions of 3-1, 3-2, and 3-3 frame helices were 29, 10, and 61%, respectively.
| DISCUSSION |
|---|
Overview:
Our results from analysis of the distribution of LRT scores of all helices indicate that codon bias per se has only a weak effect on the potential for formation of individual secondary structures. In fact, what appeared to exert the strongest effect on overall potential for structure formation was N3 content (Fig 2 and Fig 3). For Adh, we observed that evening out the N3 content leads to an increase in the proportion of helices with high LRT scores. We attribute this pattern to an increase in pairing potential when N3 content was approximately equally distributed among the four bases to
25% each.
The skew in base composition at first and second codon positions is less severe than at third-codon positions in Adh, which are skewed toward high C3 (42.1%) and low A3 (7.3%) content (Table 1). Removing that skew at third positions increases the potential for pairings between third-codon position nucleotides and those at first and second positions. The proportion of best helices in the 3-1, 3-2, and 3-3 frame helices was 21% (i.e., 21 of the 100 maximum LRT helices predicted from analyses of the 100 randomized alignments were in the 3-1 frame), 17%, and 62% for the native bias, native N3 randomizations and 27, 20, and 53% for the native bias, equal N3 randomizations (Table 2). Similarly, the proportion of best helices in the 3-1, 3-2, and 3-3 conformations was 26, 10, and 64% for the reduced bias, native N3 randomizations and 38, 16, and 46% for the reduced bias, equal N3 randomizations. In both cases, native bias or reduced bias, the proportion of 3-1 and 3-2 helices was higher in the equal N3 simulations than in the native N3 simulations. Altering N3 content to a more equal distribution among the four bases also enhances the potential for pairing in the 3-3 frame. Since the native Adh sequences are skewed toward a C3 bias, not an equal GC bias, G-C pairings at third positions are more restricted. The same holds for A-T pairings in the native Adh sequences, where the rarity of A3 restricts the number of A-T pairings between third-position nucleotides. This is evidenced by the higher average LRT score of 3-3 frame helices in the equal N3 randomizations than in the native N3 randomizations (Table 2).
Adhr, a gene with less codon bias and lower GC3% (i.e., a more equally distributed N3%) than Adh, actually exhibited a greater potential for formation of high LRT helices (
20) than did any of the four sets of randomization simulations. The exact opposite pattern was observed when comparing the LRT distribution of helices predicted from wild-type Adh sequences with the distributions from randomization simulations. This comparison of the two patterns provides additional evidence that in genes with high levels of codon bias the formation of strong individual secondary structures is inhibited through the alteration of N3 content.
mRNA secondary structure and gene expression:
In Drosophila, as in other organisms with high levels of codon bias, all codon families tend to be biased for the same individual nucleotides (![]()
![]()
![]()
In principle, it is possible to have equally strong codon bias, but without any among-codon family consistency for preferred nucleotides. That is essentially what our native bias, equal N3 randomizations are designed to simulate. If the only factor driving the evolution of codon bias was selection for preferred codons to maximize translational efficiency and/or accuracy, there would be no reason to expect that each codon family shares the same base preference. In other words, each codon family would still tend to prefer a certain base at the third position, but that preferred base would be unique to each codon family, such that the pattern of codon bias in natural sequences would be similar to our native bias, equal N3 randomizations. The observation that such genes are nonexistent is consistent with the hypothesis that the formation of long and stable helices interferes with the process of mRNA translation. This hypothesis is supported by the results from analysis of the distribution of LRT scores of individual helices (Fig 2 and Fig 3). The translational efficiency (![]()
![]()
![]()
We must therefore consider the possibility that mRNA secondary structure can also affect the rate of mRNA translation. Preference for C3 synonymous codons in Drosophila has two benefits: (1) enhancing translation efficiency and accuracy by matching an abundant tRNA pool and (2) minimizing the formation of highly stable hairpins that would interfere with ribosome movement and consequently reduce the translation rate. The combined effect may in fact be synergistic (nonadditive) and perhaps experimentally measurable. However, our data do not allow us to address the following question: Why C3 instead of G3, A3, or T3? Clearly a consistent preference among all the codon families for any of the four nucleotides at the third-codon position would result in a decreased potential for the formation of individual secondary structures. Perhaps the C3 preference is the result of a "frozen accident" due to genetic drift. Once C3 became the established preferred third-position nucleotide for most codon families, the evolution of alternate preferred N3s would be prevented by the presence of fitness valleys (excepting, of course, those twofold degenerate codon families encoded by a third-position purine).
Reading frame pairings:
Our finding that the 3-2 reading frame pairings were less common than 3-1 reading frame pairings stands in contrast to ![]()
![]()
![]()
![]()
mRNA secondary structure and codon biasdual regulators of gene expression?
While there has been considerable work on the relationship between mRNA secondary structure and gene expression, and that between codon bias and gene expression, few studies have explored the relationship between codon usage and mRNA secondary structure. ![]()
![]()
![]()
![]()
These studies, combined with the evidence in this study, suggest that the joint effects of mRNA secondary structure and codon bias may interact to regulate the level of gene expression (Fig 1). In the highly expressed Adh gene, the balance is shifted toward codon bias, a positive regulator of gene expression. In the weakly expressed Adhr gene, the balance is shifted toward mRNA secondary structure, a negative regulator of gene expression. According to our preliminary model, evolutionary shifts in the balance between codon bias and mRNA secondary structure are mediated through N3 content. Natural selection for a particular nucleotide at the third-codon position, consistent across most codon families (e.g., C in Drosophila), results in codon bias without the disruptive effects of stable secondary structures. For weakly expressed genes, there is selection for roughly equal G and C (or A and T) at N3. Neutral drift alone would result in a relatively equal frequency of all four nucleotides at the third-codon position (i.e., 25% A3, 25% C3, 25% G3, and 25% T3). However, stronger secondary structures would form if there was natural selection for high GC3% (50% G3, 50% C3 in the extreme case) or high AT3% (50% A3, 50% T3 in the extreme case) because this would maximize the pairing potential between opposite strands of a helix. Therefore, we conclude that N3 content in weakly expressed genes may also be governed by natural selection. Selection for high, equally distributed GC3% or AT3% could potentially promote the formation of stable secondary structures, resulting in an inhibitory effect on mRNA translation rates.
It should be borne in mind that the results of this study are based on the simplest paired-site model of nucleotide substitution, involving the estimation of only one free parameter after scaling the branch lengths (![]()
![]()
Previous studies have compared the global stability of mRNAs vs. various forms of randomized sequences (![]()
![]()
![]()
![]()
To complement these studies, it would be interesting to examine the effect of altering codon bias and N3 content on the formation of global mRNA secondary structures in genes with different levels of expression. The pattern would not necessarily be the same as that revealed in the present analysis of individual helices. One might predict that, on average, highly expressed genes would exhibit greater global stability than weakly expressed genes. A greater global stability of highly expressed mRNAs would be advantageous because such mRNAs would be more resistant to degradation, resulting in a longer residence time in the cell compared to mRNAs of weakly expressed genes. How could this apparent discrepancy be reconciled? Although highly expressed mRNAs might have greater global stabilities, the individual helices in the global structure would have to be relatively short and weak. Since we found no convincing evidence that any of the individual helices in Adh are stronger than randomized sequences, it may be that there is no particular conserved global structure for a set of related mRNAs. Instead, many alternate global structures of approximately the same stability could form, and the constituent helices of global structures would be relatively weak. To test this theory, the predictions of currently available programs for inferring global mRNA secondary structures on the basis of the comparative method (e.g., ![]()
![]()
| ACKNOWLEDGMENTS |
|---|
We thank J. Parsch for providing the source code for Pirandom, a randomization program that shuffles the columns of a set of aligned sequences. We thank J. Braverman for providing the source code to the PIRANAH and GROUPER computer programs. This research was supported by National Institutes of Health grant GM-58405 and by funds from the University of Munich to W.S.
Manuscript received April 18, 2001; Accepted for publication July 10, 2001.
| LITERATURE CITED |
|---|
AKASHI, H., 1994 Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics 136:927-935[Abstract].
ANTEZANA, M. A. and M. KREITMAN, 1999 The nonrandom location of synonymous codons suggests that reading frame-independent forces have patterned codon preferences. J. Mol. Evol. 49:36-43[Medline].
BENNETZEN, J. L. and B. D. HALL, 1982 Codon selection in yeast. J. Biol. Chem. 257:3026-3031
BULMER, M., 1991 The selection-mutation-drift theory of synonymous codon usage. Genetics 129:897-907[Abstract].
DOCK, A. C., B. LORBER, D. MORAS, G. PIXA, and J. C. THIERRY et al., 1984 Crystallization of transfer ribonucleic acids. Biochimie 66:179-201[Medline].
ELLIS, R. J. and F. U. HARTL, 1999 Principles of protein folding in the cellular environment. Curr. Opin. Struct. Biol. 9:102-110[Medline].
FITCH, W. M., 1974 The large extent of putative secondary nucleic acid structure in random nucleotide sequences or amino acid derived messenger-RNA. J. Mol. Evol. 3:279-291[Medline].
FOX, G. E. and C. R. WOESE, 1975 5S rRNA secondary structure. Nature 256:505-507[Medline].
GRANTHAM, R., C. GAUTIER, M. GOUY, R. MERCIER, and A. PAVE, 1980 Codon catalog usage and the genome hypothesis. Nucleic Acids Res. 8:49-62.
IKEMURA, T., 1981 Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes. J. Mol. Biol. 146:1-21[Medline].
KIRBY, D. A., S. V. MUSE, and W. STEPHAN, 1995 Maintenance of pre-mRNA secondary structure by epistatic selection. Proc. Natl. Acad. Sci. USA 92:9047-9051
KNUDSEN, B. and J. HEIN, 1999 RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 15:446-454
KONECNY, J., M. SCHÖNIGER, I. HOFACKER, M.-D. WEITZE, and G. L. HOFACKER, 2000 Concurrent neutral evolution of mRNA secondary structures and encoded proteins. J. Mol. Evol. 50:238-242[Medline].
KONINGS, D. A. and R. R. GUTELL, 1995 A comparison of thermodynamic foldings with comparatively derived structures of 16S and 16S-like rRNAs. RNA 1:559-574[Abstract].
MITA, K., S. ICHIMURA, M. ZAMA, and T. C. JAMES, 1988 Specific codon usage pattern and its implications on the secondary structure of silk fibroin mRNA. J. Mol. Biol. 203:917-925[Medline].
MORIYAMA, E. N. and D. L. HARTL, 1993 Codon usage bias and base composition of nuclear genes in Drosophila. Genetics 134:847-858[Abstract].
MUSE, S. V., 1995 Evolutionary analyses of DNA sequences subject to constraints of secondary structure. Genetics 139:1429-1439[Abstract].
NAKAMURA, Y., T. GOJOBORI, and T. IKEMURA, 1998 Codon usage tabulated from the international DNA sequence databases. Nucleic Acids Res. 26:334
NETZER, W. J. and F. U. HARTL, 1997 Recombination of protein domains facilitated by co-translational folding in eukaryotes. Nature 399:343-349.
NOLLER, H. F. and C. R. WOESE, 1981 Secondary structure of 16S ribosomal RNA. Science 212:403-411
PACE, N. R., D. K. SMITH, G. J. OLSEN, and B. D. JAMES, 1989 Phylogenetic comparative analysis and the secondary structure of ribonuclease P RNAa review. Gene 82:65-75[Medline].
PARSCH, J., J. M. BRAVERMAN, and W. STEPHAN, 2000 Comparative sequence analysis and patterns of covariation in RNA secondary structures. Genetics 154:909-921
POST, L. E., G. D. STRYCHARZ, M. NOMURA, H. LEWIS, and P. P. DENNIS, 1979 Nucleotide sequence of the ribosomal protein gene cluster adjacent to the gene for RNA polymerase subunit b in E. coli. Proc. Natl. Acad. Sci. USA 76:1697-1701
POWELL, J. R. and E. N. MORIYAMA, 1997 Evolution of codon bias in Drosophila. Proc. Natl. Acad. Sci. USA 94:7784-7790
RIVAS, A. and S. R. EDDY, 2000 Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs. Bioinformatics 16:583-605
RUSSO, C. A. M., N. TAKEZAKI, and M. NEI, 1995 Molecular phylogeny and divergence times of Drosophilid species. Mol. Biol. Evol. 12:391-404[Abstract].
SAVILL, N. J., D. C. HOYLE, and P. HIGGS, 2001 RNA sequence evolution with secondary structure constraints: comparison of substitution rate models using maximum-likelihood methods. Genetics 157:399-411
SEFFENS, W. and D. DIGBY, 1999 mRNAs have greater negative folding free energies than shuffled or codon choice randomized sequences. Nucleic Acids Res. 27:1578-1584
SHARP, P. M. and W.-H. LI, 1986 An evolutionary perspective on synonymous codon usage in unicellular organisms. J. Mol. Evol. 24:28-38[Medline].
SPRINZL, M., T. HARTMANN, F. MEISSNER, J. MOLL, and T. VORDERWÜLBECKE, 1987 Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res. 15(Suppl.):r53-r188.
TAYLOR, P. L., 1996 GeneJockey II. Biosoft, Cambridge, United Kingdom.
WADA, A. and A. SUYAMA, 1986 Local stability of DNA and RNA secondary structure and its relation to biological functions. Prog. Biophys. Mol. Biol. 47:113-157[Medline].
WALTER, A. E., D. H. TURNER, J. KIM, M. H. LYTTLE, and P. MULLER et al., 1994 Coaxial stacking of helixes enhances binding of oligoribonucleotides and improves predictions of RNA folding. Proc. Natl. Acad. Sci. USA 91:9218-9222
WORKMAN, C. and A. KROGH, 1999 No evidence that mRNAs have lower folding free energies than random sequences with the same dinucleotide distribution. Nucleic Acids Res. 27:4816-4822
ZAMA, M., 1990 Codon usage pattern in a2(I) chain domain of chicken type I collagen and its implications for the secondary structure of the mRNA and the synthesis pauses of the collagen. Biochem. Biophys. Res. Commun. 167:772-776[Medline].
ZUKER, M., J. A. JAEGER, and D. H. TURNER, 1991 A comparison of optimal and suboptimal RNA secondary structures predicted by free energy minimization with structures determined by phylogenetic comparison. Nucleic Acids Res. 19:2707-2714
This article has been cited by other articles:
![]() |
T. Warnecke and L. D. Hurst Evidence for a Trade-Off between Translational Efficiency and Splicing Regulation in Determining Synonymous Codon Usage in Drosophila melanogaster Mol. Biol. Evol., December 1, 2007; 24(12): 2755 - 2762. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Wu, Y. Zheng, I. Qureshi, H. T. Zin, T. Beck, B. Bulka, and S. J. Freeland SGDB: a database of synthetic genes re-designed for optimizing protein over-expression Nucleic Acids Res., January 12, 2007; 35(suppl_1): D76 - D79. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. A. Shabalina, A. Y. Ogurtsov, and N. A. Spiridonov A periodic pattern of mRNA secondary structure created by the genetic code. Nucleic Acids Res., January 1, 2006; 34(8): 2428 - 2437. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Comeron and T. B. Guthrie Intragenic Hill-Robertson Interference Influences Selection Intensity on Synonymous Mutations in Drosophila Mol. Biol. Evol., December 1, 2005; 22(12): 2519 - 2530. [Abstract] [Full Text] [PDF] |
||||
![]() |
K.-N. Zhao, W. Gu, N. X. Fang, N. A. Saunders, and I. H. Frazer Gene Codon Composition Determines Differentiation-Dependent Expression of a Viral Capsid Gene in Keratinocytes In Vitro and In Vivo Mol. Cell. Biol., October 1, 2005; 25(19): 8643 - 8655. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. H. Frey, H. Alakus, J. Wohlschlaeger, K. J. Schmitz, G. Winde, H. G. van Calker, K.-H. Jockel, W. Siffert, and K. W. Schmid GNAS1 T393C Polymorphism and Survival in Patients with Sporadic Colorectal Cancer Clin. Cancer Res., July 15, 2005; 11(14): 5071 - 5077. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. B. Carlini Context-Dependent Codon Bias and Messenger RNA Longevity in the Yeast Transcriptome Mol. Biol. Evol., June 1, 2005; 22(6): 1403 - 1411. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Qin, W. B. Wu, J. M. Comeron, M. Kreitman, and W.-H. Li Intragenic Spatial Patterns of Codon Usage Bias in Prokaryotic and Eukaryotic Genomes Genetics, December 1, 2004; 168(4): 2245 - 2260. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. I. Wright, C. B. K. Yau, M. Looseley, and B. C. Meyers Effects of Gene Expression on Molecular Evolution in Arabidopsis thaliana and Arabidopsis lyrata Mol. Biol. Evol., September 1, 2004; 21(9): 1719 - 1726. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-V. Chamary and L. D. Hurst Similar Rates but Different Modes of Sequence Evolution in Introns and at Exonic Silent Sites in Rodents: Evidence for Selectively Driven Codon Usage Mol. Biol. Evol., June 1, 2004; 21(6): 1014 - 1023. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. M. Matzkin Population Genetics and Geographic Variation of Alcohol Dehydrogenase (Adh) Paralogs and Glucose-6-Phosphate Dehydrogenase (G6pd) in Drosophila mojavensis Mol. Biol. Evol., February 1, 2004; 21(2): 276 - 285. [Abstract] [Full Text] [PDF] |
||||






