- THIS ARTICLE
-
Abstract
- Full Text (PDF)
-
All Versions of this Article:
genetics.105.051300v1
173/1/473 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Wilke, C. O.
- Articles by Drummond, D. A.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Wilke, C. O.
- Articles by Drummond, D. A.
Originally published as Genetics Published Articles Ahead of Print on February 19, 2006.
Genetics, Vol. 173, 473-481, May 2006, Copyright © 2006
doi:10.1534/genetics.105.051300
Population Genetics of Translational Robustness
Claus O. Wilke*,1 and
D. Allan Drummond
* Section of Integrative Biology and Center for Computational Biology and Bioinformatics, University of Texas, Austin, Texas 78712 and
Program in Computation and Neural Systems, California Institute of Technology, Pasadena, California 91125
1 Corresponding author: Integrative Biology, University of Texas, 1 University StationC0930, Austin, TX 78712.
E-mail: cwilke{at}mail.utexas.edu
>ABSTRACT
MATERIALS AND METHODS
ANALYTICAL RESULTS
SIMULATION RESULTS
DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
Recent work has shown that expression level is the main predictor of a gene's evolutionary rate and that more highly expressed genes evolve slower. A possible explanation for this observation is selection for proteins that fold properly despite mistranslation, in short selection for translational robustness. Translational robustness leads to the somewhat paradoxical prediction that highly expressed genes are extremely tolerant to missense substitutions but nevertheless evolve very slowly. Here, we study a simple theoretical model of translational robustness that allows us to gain analytic insight into how this paradoxical behavior arises.
THE increasing availability of whole-genome sequences from many different species has revealed a surprising fact: Different genes within the same organism evolve at dramatically different rates. For example, the evolutionary rates of the fastest- and slowest-evolving genes in Saccharomyces cerevisiae are separated by three orders of magnitude (DRUMMOND et al. 2005). Because the dominant force shaping genomewide patterns of evolutionary rate is most likely stabilizing selection, the evolutionary rates of genes should correlate with quantities that measure how important or otherwise constrained a gene is (KIMURA 1983; OHTA 1992). A wide array of such quantities have been proposed, shown to correlate with evolutionary rate, and subsequently disputed. Examples include a gene's dispensability or essentiality (HURST and SMITH 1999; HIRSH and FRASER 2001; JORDAN et al. 2002; PAL et al. 2003; ZHANG and HE 2005; WALL et al. 2005), its number of interaction partners (FRASER et al. 2002; BLOOM and ADAMI 2003; JORDAN et al. 2003; HAHN et al. 2004; AGRAFIOTI et al. 2005), its length (MARAIS and DURET 2001), or its centrality in the protein interaction network (HAHN and KERN 2005). However, it seems that most importantly, the expression level (PAL et al. 2001; ROCHA and DANCHIN 2004), or perhaps more accurately the frequency of translation events (DRUMMOND et al. 2005, 2006), influences evolutionary rate.
DRUMMOND et al. (2005) have recently introduced a theory for why highly expressed genes evolve slowly. Translation is error prone, and inactivated or misfolded proteins resulting from mistranslation impose substantial costs on the cell (BUCCIANTINI et al. 2002), costs that increase with expression level. One way in which the cost associated with a highly expressed gene is reduced is translational accuracy (AKASHI 1994, 2001), whereby the gene is encoded with optimal codons whose translation is less error prone than the translation of other codons. Translational accuracy can explain why the rate of synonymous evolution dS is correlated with expression level, codon adaptation index, or protein abundance. However, it cannot explain why the rate of nonsynonymous evolution dN shows an even stronger correlation with these quantities (DRUMMOND et al. 2005). Selection for translational accuracy can reduce the translational error rate by a factor of 49 (PRECUP and PARKER 1987), but even optimally coded genes that are highly expressed may produce a large amount of erroneous polypeptides. Therefore, DRUMMOND et al. (2005) suggest that highly expressed genes should be under additional selection to be tolerant to translation errors, that is, the polypeptides produced from these genes should fold properly even if they were erroneously translated. Recent work on protein biochemistry has shown that proteins can differ widely in their tolerance to missense substitutions and that properly chosen point mutations can dramatically increase the tolerance of a protein to additional substitutions (BLOOM et al. 2005). Drummond et al.'s hypothesis, referred to as selection for translational robustness, predicts a constraint on the nonsynonymous rate of evolution, whereas selection for translational accuracy predicts primarily a constraint on the synonymous rate of evolution. We must assume that both selective constraints will operate on genes that are frequently translated.
Genes that are highly tolerant to translational missense errors must also, by definition, be highly tolerant to missense substitutions. However, the translational robustness hypothesis predicts that these genes will nevertheless be strongly conserved under evolution. An example of a gene that exhibits this paradoxical behavior is Rubisco, an extremely abundant protein that fixes carbon dioxide during photosynthesis. Rubisco is strongly conserved across phyla, but appears to tolerate most missense substitutions in the laboratory without loss of fold (SPREITZER 1993; KELLOGG and JULIANO 1997).
The purpose of this article is to put the translational robustness hypothesis into precise mathematical terms and to demonstrate how highly expressed genes can evolve to be tolerant to missense substitutions and yet remain strongly conserved under evolution.
ABSTRACT
>MATERIALS AND METHODS
ANALYTICAL RESULTS
SIMULATION RESULTS
DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
Model:
We consider the evolution of a gene encoding a protein of length L. Each site in the protein can be in one of two states, neutral or nonneutral. We denote the number of neutral sites in the protein by n (see Table 1 for definitions of variables). A mutation at a neutral site of a folded protein leaves the protein folded, but changes the site from neutral to nonneutral. A mutation at a nonneutral site of a folded protein will usually cause misfolding and consequent loss of function, but with a small probability
, such a mutation will not affect folding but turn the site into a neutral one. For simplicity, we assume that once an amino acid sequence loses the ability to fold, it is impossible to mutate it back into a folded state. The rationale behind this assumption is that the likelihood of a mutation restoring fold to an unfolded amino acid sequence is so low as to be negligible. Our model is a reasonable abstraction of a thermodynamic framework of protein evolution that has recently been shown to have good predictive power for both simulated and real proteins (BLOOM et al. 2005; WILKE et al. 2005). The key insight of this framework is that a protein's tolerance to substitutions is closely related to the protein's stabilitymore stable proteins can withstand more missense substitutionsand that therefore proteins can change from being highly fragile to being highly robust to mutations and vice versa through the accumulation of stabilizing or destabilizing mutations. In this sense, a mutation from a nonneutral to a neutral site in our model corresponds to a stabilizing mutation, and the opposite mutation corresponds to a destabilizing mutation. Thus, our model captures the following key aspects of protein biochemistry: (i) Homologous proteins can vary widely in their tolerance to mutations, and individual point mutations can increase or decrease this tolerance; (ii) mutations that increase a protein's tolerance are much rarer than mutations that decrease its tolerance; (iii) highly tolerant proteins are extremely rare, while moderately tolerant proteins are abundant; and (iv) nonfunctional mutant proteins are likely to be misfolded.
|
The gene is expressed at a level that leads to the synthesis of x polypeptides. For simplicity, we assume that the total number of polypeptides translated per gene is proportional to the gene's expression level and that the constant of proportionality is 1. Thus, x is also the expression level, measured in mRNA molecules per cell. Under translation, each site is mistranslated with probability
(we neglect premature termination of the translation process). The probability that a single mRNA molecule is mistranslated and leads to a misfolded protein is 1 [1
(1
)]Ln
(L n), where the approximation holds for
. Let f =
(L n) be the fraction of synthesized proteins that misfold. We assume that the expression level is regulated such that the number of folded proteins per gene, the protein abundance A, is held constant, regardless of the translational error rate. Then, A = x(1 f). The total number of misfolded polypeptides per gene follows as xf = Af/(1 f). Finally, we assume that each misfolded polypeptide imposes a cost c on the cell, so that the total cost of a gene translated at abundance level A is cAf/(1 f). We turn this cost into a fitness value by assuming that each misfolded protein has the same relative effect on fitness. Then we can write the overall fitness of a gene with n neutral sites as
![]() | (1) |
Simulations:
We implemented a stochastic simulation of N sequences reproducing in discrete, nonoverlapping generations. We employed standard WrightFisher sampling, that is, the probability that a sequence in generation t + 1 is the offspring of a sequence at generation t is directly proportional to the latter's fitness.
We measured the evolutionary rate along the line of descent from the most recent common ancestor (MRCA) of the final population backward in time, as described by WILKE (2004). Briefly, we let the simulated population evolve until the birth time of the population's MRCA, designated t0, exceeded a fixed equilibration time tequil plus a time window tmeas, t0 > tequil + tmeas. All quantities were measured on the equilibrated population during this latter time window. For all results reported, we chose tequil = tmeas = 400,000, U = 0.001,
= 0.001, and L = 100. At all parameter settings, we carried out 100 replicas and averaged results over all replicas.
ABSTRACT
MATERIALS AND METHODS
>ANALYTICAL RESULTS
SIMULATION RESULTS
DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
Solution based on SellaHirsh theory:
We can calculate the steady-state solution of our model using the analogy between evolutionary biology and statistical physics recently demonstrated by SELLA and HIRSH (2005). The theory of SELLA and HIRSH (2005) is applicable whenever the product of population size and mutation rate is <<1, NU << 1. In this regime, the population is essentially homogeneous at all times and can be represented at any given point in time by a single sequence. We say that the population is in state i if the dominant sequence in the population is sequence i. The key insight of SELLA and HIRSH (2005) is that the probability pi to find the population in state i is proportional to a function F(i) (also called a Boltzmann factor) that depends only on the fitness of sequence i, the population size, and details of the mutation process. Thus, it follows that
![]() | (2) |
As the fitness of a sequence in our model depends only on the sequence's number of neutral sites n, it is useful to lump all sequences with the same n into a single class and calculate the probability pn that the population is in a state represented by any sequence with n neutral sites. Since there are
such sequences, we introduce
, where F(n) is the appropriate Boltzmann factor for a single sequence with n neutral sites, and then calculate pn as
. In our case,
(n) is given by
![]() | (3) |
less likely to occur than mutations that decrease n (SELLA and HIRSH 2005, supplemental text).
With the formalism outlined in the previous paragraphs, we can calculate the expected number of neutral sites in the steady-state E[n] as
![]() | (4) |
![]() | (5) |
![]() | (6) |
is the probability of fixation of sequence j in background i (SELLA and HIRSH 2005),
![]() | (7) |
and
.]
We can simplify these expressions in the special cases that A is either very large or very small. After inserting Equation 1 into Equation 3, we have
![]() | (8) |
N. We continue to use this approximation throughout the rest of this article. From Equation 8, we can see that the behavior of the system changes drastically depending on whether the product NA is large or small. However, since the population size N is the same for all genes in a species while each gene's corresponding protein abundance A can vary over many orders of magnitude, in the following we assume that N is fixed and consider the limits of very small and very large A.
For
, the first term in the exponential disappears, and F'(n) becomes simply
. Thus, we find that
![]() | (9) |
For
, we have to distinguish between the cases n = L and n < L. For n < L, the first term in the exponential in Equation 8 becomes much larger in magnitude than the second term, which is a constant. Thus, we can neglect the second term and have
![]() | (10) |
L, independent of A. All terms but the n = L term disappear, and we have E[n] = L, E[w] = wL, and
![]() | (11) |
Approximate solution:
SellaHirsh theory yields an exact solution for our model. However, the resulting expressions are somewhat unwieldy and do not lead to simple analytical expressions for intermediate A. Therefore, we now derive an approximate solution to our model.
For small
, we can approximate wi
exp[A
(L i)] and find ln(wj/wi) = A
(j i). This approximation is equivalent to neglecting the small number of additional translation events required to replace polypeptides that misfold. The probability of fixation follows from Equation 7 as
![]() | (12) |
![]() | (13) |
and
are independent of n. We introduce
+ and
, the probabilities of fixation for a mutation that increases or decreases n by 1, respectively, and find
![]() | (14) |
![]() | (15) |
![]() | (16) |
![]() | (17) |
![]() | (18) |
In the limit A
0, we have
+ =
= 1/N and n* =
L/(
+ 1). [Note that this expression is identical to the result found through SellaHirsh theory (Equation 9), if we equate n* with E[n].] Therefore, in this limit,
![]() | (19) |
, we have
+ = 1,
= e2NA
, and n* = L. Therefore, in this limit,
![]() | (20) |
1 in Equation 11.
Limitations on the number of neutral sites:
Certain residues may never tolerate any substitutions, such as the active-site serine of a serine protease or the heme-binding histidines of hemoglobin. Under the assumption that m sites can never be neutral, we can write L = l + m, and for small
the fitness wn is approximately
![]() | (21) |
but not on n. Equation 7 reveals that such a rescaling leaves the fixation probabilities unchanged. Therefore, the approximate solution remains unchanged except that we replace L by l (= L m) everywhere and add a factor eA
m to Equation 17.
For SellaHirsh theory, if we assume that
m is negligibly small, the Boltzmann factor F'(n) gains a similar leading factor, which, as Equations 4 and 5 make clear, also drops out, this time through the normalization term. In the case of E[n], the result is only that L must again be replaced by l = L m, while the expected value E[w] also gains a leading factor eA
m, just as in the approximate case.
In short, when there are m never-neutral sites, the main effects are to lower the population's fitness by a factor eA
m and to reduce the expected number of neutral sites E[n] and the evolutionary rate K roughly as though the evolving gene were shorter by m residues.
ABSTRACT
MATERIALS AND METHODS
ANALYTICAL RESULTS
>SIMULATION RESULTS
DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
. The asymptotic value at A = 0 is K(0) =
U (Equation 19). For increasing A, K first increases and then rapidly drops off to zero for even larger A. The main effect of the population size N is to determine at what level of A this drop off initiates. With increasing N, the evolutionary rate K seems to be simply shifted to the left, toward lower A (Figure 1). We can make this statement more precise by considering the large-A limit of our approximate solution, Equation 20. This limit shows that the evolutionary rates K(N, A) and K(aN, a1A) are related through K(N, A)
a1K(aN, a1A), where a is an arbitrary constant. Therefore, if we increase N by a factor of a, the resulting curve K(A) appears on a log-log plot to be shifted upward and to the left by an amount of log(a). The upward shift cannot be noticed, because it exists only for very large A, and thus the effect of increasing N seems to be to simply shift the K(A) curve to the left.
|
Second, we studied the effect of varying
on K(A) (Figure 2). The variable
influences mainly the asymptotic limit of K(A) for small A [with K(A) increasing with
], but does not affect how quickly K(A) decays for large A.
|
Third, we studied the behavior of the expected fitness and the expected number of neutral sites for varying levels of A. The expected fitness is
1 for both very low and very high A, but drops below 1 for the intermediate values of A for which K(A) starts to decay (Figure 3a). The expected number of neutral sites is
L/(
+ 1) for very small A and increases to 1 for large A (Figure 3b). We can understand the different evolutionary regimes of low A and high A as follows: For low A, since very little protein is synthesized, the cost associated with misfolded proteins following erroneous translation is negligible. Therefore, the expected fitness in this regime is 1. Since the cost of misfolding is negligible, the number of neutral sites is not under selection in this regime and it settles to the value at which the mutations increasing n and those decreasing n exactly balance each other. For high A, on the other hand, the cost of translation-induced misfolding is tremendous. Therefore, at high A, the population converges to the single optimal sequence with n = L (or n = l if some sites can never be neutral). Every mutation that reduces n by even 1 is highly deleterious and therefore will virtually never go to fixation, even in a small population. For l = L, the optimal sequence (which has n = l) pays no cost whatsoever under mistranslation, and the expected fitness is again 1. For intermediate A, the cost of translation-induced misfolding is significant but not debilitating. As a result, n is elevated in comparison to its low-A limit, but the expected fitness falls below 1.
|
Finally, the calculation in the APPENDIX predicts that the evolutionary rate should be independent of N if we plot it as a function of the expected number of neutral sites E[n] rather than a function of A. Figure 4 shows that this prediction is indeed accurate. We find that there are two distinct regimes for the evolutionary rate. For small E[n], the evolutionary rate increases with E[n]. This increase is caused by the increased availability of neutral mutations with growing E[n]. However, even though we can calculate what the evolutionary rate would be for arbitrarily small E[n], in equilibrium E[n] will never be below its limiting value for small A,
L/(
+ 1). For large E[n], the behavior of K is reversed, and now it decreases with increasing E[n]. The decay comes about because in this regime, even though there are many mutations that do not disrupt the fold of a properly translated protein, these mutations increase the amount of mistranslated, misfolded proteins and thus are selected against. The quantity E[n] can get arbitrarily close to L (or l for m > 0), and therefore K can decay to almost zero if A is sufficiently large.
|
Throughout this study, we found good agreement among the numerical simulations, SellaHirsh theory, and our simple analytical approximation. Some discrepancies appeared between theory and simulations for the largest population size (N = 1000) and for very small
in conjunction with large A. We attribute the former to a violation of the condition
, which must be satisfied for both SellaHirsh theory and our approximate solution. We carried out our simulations with a mutation rate of U = 0.001, which means that NU = 1 for N = 1000. The latter discrepancies are caused by insufficient equilibration time. For large A, the number of neutral sites n always approaches L, irrespective of population size or
. However, the smaller
is, the lower the probability that a mutation occurs that increases n. Therefore, the equilibration time needed at large A grows without bound as
decreases. We did additional simulations in this regime and found that the simulation results approached the predicted quantities with increasing equilibration time (data not shown). ABSTRACT
MATERIALS AND METHODS
ANALYTICAL RESULTS
SIMULATION RESULTS
>DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
). The model abstracts a previous thermodynamic model of protein mutational tolerance introduced by BLOOM et al. (2005) in which mutations may leave unperturbed or destabilize the protein's native structure (common) or stabilize it (uncommon). Increases in stability tend to increase the number of sites at which substitutions can be tolerated, so-called neutral sites, while decreases in stability usually cause misfolding or decrease the number of neutral sites. In this work, we have modeled neutral sites directly. In doing so, we allow only stepwise changes in neutral sites, sacrificing treatment of large stability changes that might radically alter the number of neutral sites and the potential stability dependence of mutational effects to gain analytical tractability.
Our results show a clear example of the paradox cited in the Introduction (Figure 4): Given selection against the costs of protein misfolding, genes simultaneously become more mutationally tolerant (larger number of neutral sites, n) but evolve slower as if fewer mutations were tolerated. The paradox's resolution requires disentangling two kinds of mutational tolerance, one of which captures the likelihood of loss of protein function due to a mutation while the other quantifies the fitness cost of that mutation, the cost that ultimately determines evolutionary rate. When protein misfolding imposes minimal fitness costs, as is the case with low-expression proteins, the proportion of mutations that preserve protein function governs the rate of evolution (Figure 4, left). However, at high expression levels, fitness costs of mutations that preserve protein function grow large and can become the dominant determinant of evolutionary rate (Figure 4, right).
This observation has an important corollary. When selection for translational robustness is weak, functional loss is likely the main determinant of fitness costs. Thus, our results suggest that evolutionary conservation of sites in low-expression proteins may be more likely to indicate functional importance than similar conservation at sites in high-expression proteins.
Our simple model produces an exponential decline in evolutionary rate with increasing expression level, whereas in yeast, a power law better describes this relationship (DRUMMOND et al. 2005). Several possibilities may explain the discrepancy. First, our model assumes a symmetric binomial distribution of the number of neutral sites, but the distribution for real proteins may be skewed or heavy tailed. Second, the cost of additional misfolded proteins may not be independent of the number of already misfolded proteins. For example, misfolded proteins form toxic aggregates (BUCCIANTINI et al. 2002), and aggregation is not a linear function of protein concentration. Finally, differences in protein structure and function between high- and low-expression proteins may play a role. DRUMMOND et al. (2005) have previously examined differences between functionally and structurally similar paralogs and found a similar power-law relationship. However, more subtle but important differences may separate paralogs and influence their evolution.
Our results here demonstrate that profound differences in protein evolutionary rates can arise even in the absence of functional and structural differences and when variables such as protein length, the translation error rate, and the underlying distribution of the number of neutral sites are held constant. In real genomes, of course, all these features vary and some, perhaps all, are under selection. The value of the model is its utility in explaining why highly expressed proteins evolve slowly across taxa (DRUMMOND et al. 2005).
Interestingly, our model reveals two evolutionary-rate regimes (Figures 1 and 2), one in which rates remain relatively constant with increasing protein production and another in which rates decline precipitously. In yeast, virtually all genes appear to fall in the latter regime, as the evolutionary rate declines almost immediately from low to high expression (DRUMMOND et al. 2005), raising the possibility of genome-level selection in this direction. If yeast protein synthesis levels reflect organismal needs and cannot be freely modulated, as seems likely, and protein synthesis costs dominate cellular energy consumption, as evidence suggests (PRINCIOTTA et al. 2003), the remaining genome-level target for selection is on the fitness cost per misfolded protein, c. Decreasing c pushes genes away from the decline to where cost differences become negligible, whereas increasing c pushes genes toward the decline, amplifying the cost difference between high- and low-expression proteins. One way to decrease c is to drive translation errors down to negligible levels. Another is to maintain a quality-control apparatus (e.g., chaperones and proteases) with so much excess bandwidth that cost differences associated with variability in protein misfolding become negligible. Evidence suggests that both strategies for decreasing c impose significant fitness penalties. Decreasing translational error rates can be easily accomplished, often with a single ribosomal mutation (ALKSNE et al. 1993), but ribosomal accuracy and growth rate are often negatively correlated, presumably through the intrinsic speed/accuracy trade-off inherent in ribosomal proofreading (KURLAND 1992). Maintenance of a chaperone fleet large enough to dilute out misfolding cost differences would divert enormous cellular resources for little benefit, and the massive induction of chaperones after heat shock suggests that cellular chaperone levels do not have much remaining bandwidth under normal conditions. Overall, it seems plausible that the steep decline observed in yeast's evolutionary-rateexpression relationship reflects a balance favoring a relatively high cost per misfolded protein c. Costly translational accuracy and quality control machinery may be reduced so long as the increased errors and reduced folding assistance can be compensated for; translational robustness provides that compensationessentially for freebut is ultimately limited by mutation pressure away from robust sequences and by the fundamental intolerance of proteins to at least some errors.
Our model distinguishes between the number of polypeptides produced per gene, x, and the abundance of functional proteins, A, yet our approximate solution essentially equates these quantities with only minor accuracy loss. The approximation works for two reasons. First, misfolded polypeptides impose a negligible cost for low-abundance proteins, while for high-abundance proteins, misfolded polypeptides are rare because of selection for translational robustness. We expect these nontrivial results to hold for many organisms. Second, in our model, the number of translation events x, the primary determinant of the number of mistranslated proteins, is estimated accurately by A, a situation unlikely to hold for most organisms. Protein abundance reflects a balance of ongoing translation and turnover (GREENBAUM et al. 2003), such that a high abundance can result either from moderate translational frequency and long protein half-life or from rapid translation and short half-life. Because half-lives can vary over orders of magnitude (HARGROVE and SCHMIDT 1989), abundance and translation frequency may only weakly correlate in real organisms. Among protein abundance, mRNA expression level, and translation frequency, we hypothesize that the latter, even though difficult to measure, will best predict evolutionary rate.
BÜRGER et al. (2005) recently studied a question closely related to this article, asking why phenotypic mutation rates (corresponding to the translational error rate in this article) are much higher than genotypic mutation rates. Within the framework of their model, BÜRGER et al. (2005) found very little pressure for reduction of phenotypic mutation rates below a certain threshold. Even though we keep the translational error rate constant in our model, we can consider a change in the number of neutral sites n as a change in the phenotypic mutation rate and thus compare our results to those of BÜRGER et al. (2005). In contrast to their conclusions, we find that the fraction of neutral sites, n/L, quickly rises to the maximum possible for highly expressed genes, thus reducing the phenotypic mutation rate to zero except when some sites cannot be made neutral. We believe that the differences in results are caused by differences in the way in which we and BÜRGER et al. (2005) treated costs of erroneously translated proteins in our models. BÜRGER et al. (2005) consider the total cost of protein synthesis, but do not consider additional penalties imposed by misfolded proteins, not only for their recognition and cleanup by the quality-control system but also for their innate toxicity (BUCCIANTINI et al. 2002). Clearly, if we neglect these unique costs, then the only pressure to reduce the phenotypic mutation rate is to reduce the cost of synthesis for misfolded proteins, and this pressure will be weak if this cost is only a small proportion of the total cost of protein synthesis. In our model, on the other hand, we have focused exclusively on costs of misfolded proteins apart from their synthesis costs, implicitly assuming that the total cost of protein synthesis is approximately equal to the cost of synthesis of functional proteins and that the benefit of the functional proteins will pay for their synthesis. We believe that there is indeed a strong selective pressure to reduce the phenotypic mutation rate for highly expressed genes, but that it is cheaper for cells to evolve translationally robust genes than to evolve highly accurate transcription and translation machinery.
Can translational robustness really be obtained cheaply? DRUMMOND et al. (2005) have suggested that increased protein stability both confers mutational tolerance and constrains sequence evolution. Increasing protein stability, that is, decreasing the free energy of folding
Gf, provides a plausible mechanism for obtaining translational robustness for numerous reasons. First, increased stability leads to increased mutational tolerance and can be obtained by point mutations (BLOOM et al. 2005). Second, the stability-increase mechanism is sufficiently general to encompass proteins of diverse functions and to operate in a wide range of organisms. Third, stability is free in the sense that obtaining a protein with lower
Gf requires only a chance mutation. While many researchers have noted an apparent trade-off between protein stability and enzymatic activity, it is crucial to emphasize that this trend may be statistical rather than intrinsic: Because both high activity and high stability are rare properties, mutations that improve both are exceedingly unlikely unless both are constrained (GIVER et al. 1998). Selection for translational robustness provides precisely that dual constraint, and because many millions of mutations may be screened over evolutionary time, the very few resulting in highly expressed proteins with increased stability (conferring tolerance to translation errors) and uncompromised activity will be found. Finally, the very rarity of such stabilizing mutations provides a measure of the scarcity of highly stable proteins available for exploration by evolutionary drift. If increased stability is a dominant response to the need for mutational tolerance in highly expressed proteins, it will restrict drift and slow evolution relative to less-constrained low-expression proteins.
ABSTRACT
MATERIALS AND METHODS
ANALYTICAL RESULTS
SIMULATION RESULTS
DISCUSSION
>APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
![]() | (A1) |
![]() | (A2) |
=
(L n)/n. We can solve this expression for A and find
![]() | (A3) |
, we obtain
![]() | (A4) |
![]() | (A5) |
![]() | (A6) |
100.
ABSTRACT
MATERIALS AND METHODS
ANALYTICAL RESULTS
SIMULATION RESULTS
DISCUSSION
APPENDIX
>ACKNOWLEDGEMENTS
LITERATURE CITED
ABSTRACT
MATERIALS AND METHODS
ANALYTICAL RESULTS
SIMULATION RESULTS
DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
>LITERATURE CITED
AGRAFIOTI, I., J. SWIRE, J. ABBOTT, D. HUNTLEY, S. BUTCHER et al., 2005 Comparative analysis of the Saccharomyces cerevisiae and Caenorhabditis elegans protein interaction networks. BMC Evol. Biol. 5: 23.[CrossRef][Medline]
AKASHI, H., 1994 Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics 136: 927935.[Abstract]
AKASHI, H., 2001 Gene expression and molecular evolution. Curr. Opin. Genet. Dev.t 11: 660666.
ALKSNE, L. E., R. A. ANTHONY, S. W. LIEBMAN and J. R. WARNER, 1993 An accuracy center in the ribosome conserved over 2 billion years. Proc. Natl. Acad. Sci. USA 90: 95389541.
BLOOM, J. D., and C. ADAMI, 2003 Apparent dependence of protein evolutionary rate on number of interactions is linked to biases in protein-protein interactions data sets. BMC Evol. Biol. 3: 21.[CrossRef][Medline]
BLOOM, J. D., J. J. SILBERG, C. O. WILKE, D. A. DRUMMOND, C. ADAMI et al., 2005 Thermodynamic prediction of protein neutrality. Proc. Natl. Acad. Sci. USA 102: 606611.
BUCCIANTINI, M., E. GIANNONI, F. CHITI, F. BARONI, L. FORMIGLI et al., 2002 Inherent toxicity of aggregates implies a common mechanism for protein misfolding diseases. Nature 416: 507511.[CrossRef][Medline]
BÜRGER, R., M. WILLENSDORFER and M. A. NOWAK, 2006 Why are phenotypic mutation rates much higher than genotypic mutation rates? Genetics 172: 197206.
DRUMMOND, D. A., J. D. BLOOM, C. ADAMI, C. O. WILKE and F. H. ARNOLD, 2005 Why highly expressed proteins evolve slowly. Proc. Natl. Acad. Sci. USA 102: 1433814343.
DRUMMOND, D. A., A. RAVAL and C. O. WILKE, 2006 A single determinant dominates the rate of yeast protein evolution. Mol. Biol. Evol. 23: 327337.
FRASER, H. B., A. E. HIRSH, L. M. STEINMETZ, C. SCHARFE and M. W. FELDMAN, 2002 Evolutionary rate in the protein interaction network. Science 296: 750752.
GIVER, L., A. GERSHENSON, P.-O. FRESKGARD and F. H. ARNOLD, 1998 Directed evolution of a thermostable esterase. Proc. Natl. Acad. Sci. USA 95: 1280912813.
GREENBAUM, D., C. COLANGELO, K. WILLIAMS and M. GERSTEIN, 2003 Comparing protein abundance and mRNA expression levels on a genomic scale. Genome Biol. 4: 117.[CrossRef][Medline]
HAHN, M. W., and A. D. KERN, 2005 Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. Mol. Biol. Evol. 22: 803806.
HAHN, M. W., G. C. CONANT and A. WAGNER, 2004 Molecular evolution in large genetic networks: Does connectivity equal constraint? J. Mol. Evol. 58: 203211.[CrossRef][Medline]
HARGROVE, J. L., and F. H. SCHMIDT, 1989 The role of mRNA and protein stability in gene expression. FASEB J. 3: 23602370.[Abstract]
HIRSH, A. E., and H. B. FRASER, 2001 Protein dispensability and rate of evolution. Nature 411: 10461049.[CrossRef][Medline]
HURST, L. D., and N. G. C. SMITH, 1999 Do essential genes evolve slowly? Curr. Biol. 9: 747750.[CrossRef][Medline]
JORDAN, I. K., I. B. ROGOZIN, Y. I. WOLF and E. V. KOONIN, 2002 Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 12: 962968.
JORDAN, I. K., Y. I. WOLF and E. V. KOONIN, 2003 No simple dependence between protein evolution rate and the number of protein-protein interactions: only the most prolific interactors tend to evolve slowly. BMC Evol. Biol. 3: 1.[CrossRef][Medline]
KELLOGG, E., and N. JULIANO, 1997 The structure and function of RuBisCO and their implications for systematic studies. Am. J. Bot. 84: 413428.[Abstract]
KIMURA, M., 1983 The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge/London/New York.
KURLAND, C. G., 1992 Translational accuracy and the fitness of bacteria. Annu. Rev. Genet. 26: 2950.[Medline]
MARAIS, G., and L. DURET, 2001 Synonymous codon usage, accuracy of translation, and gene length in Caenorhabditis elegans. J. Mol. Evol. 52: 275280.[Medline]
OHTA, T., 1992 The nearly neutral theory of molecular evolution. Annu. Rev. Ecol. Syst. 23: 263286.[CrossRef]
PAL, C., B. PAPP and L. D. HURST, 2001 Highly expressed genes in yeast evolve slowly. Genetics 158: 927931.
PAL, C., B. PAPP and L. D. HURST, 2003 Rate of evolution and gene dispensability. Nature 421: 496497.[Medline]
PRECUP, J., and J. PARKER, 1987 Missense misreading of asparagine codons as a function of codon identity and context. J. Biol. Chem. 262: 1135111355.
PRINCIOTTA, M. F., D. FINZI, S. B. QIAN, J. GIBBS, S. SCHUCHMANN et al., 2003 Quantitating protein synthesis, degradation, and endogenous antigen processing. Immunity 18: 343354.[CrossRef][Medline]
ROCHA, E. P. C., and A. DANCHIN, 2004 An analysis of determinants of amino acids substitution rates in bacterial proteins. Mol. Biol. Evol. 21: 108116.
SELLA, G., and A. E. HIRSH, 2005 The application of statistical physics to evolutionary biology. Proc. Natl. Acad. Sci. USA 102: 95419546.
SPREITZER, R. J., 1993 Genetic dissection of Rubisco structure and function. Annu. Rev. Plant Physiol. Plant Mol. Biol. 44: 411434.[CrossRef]
WALL, D. P., A. E. HIRSH, H. B. FRASER, J. KUMM, G. GIAEVER et al., 2005 Functional genomic analysis of the rates of protein evolution. Proc. Natl. Acad. Sci. USA 102: 54835488.
WILKE, C. O., 2004 Molecular clock in neutral protein evolution. BMC Genet. 5: 25.[CrossRef][Medline]
WILKE, C. O., J. D. BLOOM, D. A. DRUMMOND and A. RAVAL, 2005 Predicting the tolerance of proteins to random amino acid substitution. Biophys. J. 89: 37143720.[CrossRef][Medline]
ZHANG, J., and X. HE, 2005 Significant impact of protein dispensability on the instantaneous rate of protein evolution. Mol. Biol. Evol. 22: 11471155.
Communicating editor: N. S. WINGREEN
This article has been cited by other articles:
![]() |
T. Bedford, I. Wapinski, and D. L. Hartl Overdispersion of the Molecular Clock Varies Between Yeast, Drosophila and Mammals Genetics, June 1, 2008; 179(2): 977 - 984. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
-
All Versions of this Article:
genetics.105.051300v1
173/1/473 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Wilke, C. O.
- Articles by Drummond, D. A.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Wilke, C. O.
- Articles by Drummond, D. A.































