Abstract
Most methods for estimating the rate of synonymous and nonsynonymous substitution per site define a site as a mutational opportunity: the proportion of sites that are synonymous is equal to the proportion of mutations that would be synonymous under the model of evolution being considered. Here we demonstrate that this definition of a site can give misleading results and that a physical definition of site should be used in some circumstances. We illustrate our point by reexamining the relationship between codon usage bias and the synonymous substitution rate. It has recently been shown that the rate of synonymous substitution, calculated using the GoldmanYang method, which encapsulates the mutationalopportunity definition of a site at a high level of sophistication, is either positively correlated or uncorrelated to synonymous codon bias in Drosophila. Using other methods, which account for synonymous codon bias but define a site physically, we show that there is a negative correlation between the synonymous substitution rate and codon bias and that the lack of a negative correlation using the GoldmanYang method is due to the way in which the number of synonymous sites is counted. We also show that there is a positive correlation between the synonymous substitution rate and third position GC content in mammals, but that the relationship is considerably weaker than that obtained using the GoldmanYang method. We argue that the GoldmanYang method is misleading in this context and conclude that methods that rely on a mutationalopportunity definition of a site should be used with caution.
THERE are many different methods designed to estimate the rate of synonymous and nonsynonymous substitution (Miyata and Yasunaga 1980; Perleret al. 1980; Liet al. 1985; Nei and Gojobori 1986; Li 1993; Pamilo and Bianchi 1993; Goldman and Yang 1994; Muse and Gaut 1994; Comeron 1995; Ina 1995). These vary from the relatively simple to the extremely complex. With the exception of the method of Muse and Gaut (1994), which estimates rates per codon, each method generates an estimate of the synonymous and nonsynonymous substitution rate per site (often given the symbols d_{s} and d_{n}, which we use here, or K_{s} and K_{a}) by attempting to estimate the number of synonymous and nonsynonymous sites (hereafter L_{s} and L_{n}). However, the definition of a site is not straightforward (Muse and Gaut 1994; Muse 1996). For example, consider the problem of twofold degenerate codons—do we define the third position as a synonymous site, onethird of a synonymous site, or some other fraction of a synonymous site, which depends upon the transition:transversion (ts/tv) ratio and the level of synonymous codon bias? Most modern methods, such as those of Li (1993) and Goldman and Yang (1994), define the concept of site as a “mutational opportunity”—the proportion of sites that are synonymous is the proportion of mutations that are synonymous under the model of evolution being considered; so most of the modern methods would class a twofold degenerate site as largely synonymous if the ts/tv ratio is high, because most of the mutations occurring at such sites are synonymous (see appendix a).
An alternative way to proceed is to define sites “physically” and to estimate the rates of substitution at sites of different degeneracy separately. Thus we estimate rates of synonymous substitution at twofold and fourfold sites independently with the number of sites, in each case, being the actual number of sites that are twofold and fourfold degenerate. One could also estimate the synonymous substitution rate at threefold degenerate sites but there are usually too few of them to warrant consideration. For nonsynonymous sites it is usual to estimate the rate per codon (appendix b).
The aim of this article is to compare these two ways in which we can define a site: as a mutational opportunity or as a physical position. Counting sites as mutational opportunities seems a sensible way to proceed—if the ts/tv ratio is very high, most mutations at a twofold degenerate site are synonymous and the site should therefore be treated as largely synonymous. However, this definition of a site can give anomalous and misleading results. To illustrate the problem let us consider a simple model. For clarity and simplicity we assume that synonymous mutations are neutral and that nonsynonymous mutations are either neutral or deleterious. Let us assume that all codons are twofold degenerate, that the rate of transversion mutation is x per nucleotide site, and that the ts/tv ratio is α; i.e., if α= 1, each transition (e.g., C → T) occurs at the same rate as each transversion (e.g., C → A). Under this model the nonsynonymous and synonymous mutation rates per gene are, respectively,
This gives the expected results under the philosophy of counting sites as mutational opportunities; if transitions and transversions are equally frequent, then ρ_{s} = 1/9 (the third position is onethird synonymous), and if transitions greatly outnumber transversions, then ρ_{s} = 1/3 (the third position is completely synonymous). The numbers of synonymous and nonsynonymous sites are
Unfortunately, the definition of a site can be critical to our understanding of a problem. To illustrate this we reconsider the relationship between the rate of synonymous substitution and codon usage bias in Drosophila and mammals. Until recently it was generally accepted that the synonymous substitution rate was negatively correlated to the level of synonymous codon bias in enteric bacteria (Sharp and Li 1987) and Drosophila (Sharp and Li 1989; Moriyama and Hartl 1993). This was interpreted as being a consequence of natural selection acting on synonymous codon use—selection in favor of translationally optimal codons led to an increase in synonymous codon bias and a decrease in the synonymous substitution rate. However, Dunn et al. (2001) suggested that the correlation in Drosophila was an artifact of the methods used to correct for multiple hits, particularly in the genes with high synonymous codon bias. They found that the correlation between codon usage bias and the synonymous substitution rate disappeared when the maximumlikelihood codonbased method of Goldman and Yang (GY; 1994) was used to estimate the synonymous substitution rate. Recently, Betancourt and Presgraves (2002) applied the GY method to a data set of 255 Drosophila melanogaster and D. simulans loci and found a significant positive (i.e., in the reverse direction to that previously thought) correlation between the synonymous substitution rate and codon usage bias.
A similar revision has taken place in mammals. It was originally thought that the relationship between codon usage bias, measured as thirdposition GC content (GC3), and the synonymous substitution rate was a negative quadratic, with the maximum substitution rate being obtained at a GC3 value of ∼60% (Wolfeet al. 1989; Bulmeret al. 1991; though see Bernardiet al. 1993). However, Smith and Hurst (1999) and Bielawski et al. (2000) found that the synonymous substitution rate was positively correlated to GC3 using the GY method.
The lack of a negative correlation between synonymous codon bias and the synonymous substitution rate is puzzling because there is a negative correlation between the nonsynonymous substitution rate and codon usage bias in Drosophila (Akashi 1994). This correlation is found whatever method is used to estimate the nonsynonymous substitution rate, including the GY method (Betancourt and Presgraves 2002). There are a number of potential explanations for this correlation (Akashi 1994; Betancourt and Presgraves 2002), but it seems difficult to think of one that would not also generate a negative correlation between the synonymous substitution rate and codon usage bias. For example, the correlation between the rate of amino acid substitution and codon usage bias might be caused by a decrease in the mutation rate with increasing expression level (Berg and Martelius 1995; EyreWalker and Bulmer 1995). Or it might be caused by translational accuracy; i.e., genes with many crucial amino acid sites will evolve slowly, but will also have high synonymous codon bias, to avoid errors during translation (Akashi 1994). In both cases, we expect the synonymous substitution rate to decrease with increasing bias.
As we show here, the discrepancy between the relationships we see with the nonsynonymous and the synonymous substitution rates and codon usage bias, in Drosophila, is due to the definition of a site. If we use a physical definition of a site there is a negative correlation between codon usage bias and both the synonymous and nonsynonymous substitution rates in Drosophila; however, the correlation disappears if we use a mutationalopportunity definition of a site. Which of these definitions is more informative is a question we return to in the discussion.
MATERIALS AND METHODS
Materials: Dunn et al. (2001) used a number of Drosophila data sets. We focus on one of these, 35 genes from D. melanogaster and D. pseudoobscura, from which we excluded the 7 genes that Dunn et al. (2001) removed because they have nonstationary base composition. Since this data set shows a fairly high level of divergence, we also compiled a data set of 43 D. simulans and D. yakuba sequences that show a lower divergence. The aligned D. melanogaster and D. pseudoobscura sequences and the aligned D. simulans and D. yakuba sequences were kindly provided by Katherine Dunn and Nick Smith, respectively.
Bielawski et al. (2000) compiled a data set of 82 primateartiodactylrodent sequences. Here we focus on the divergence between primates and artiodactyls, which formed much of the analysis in their article.
Methods: There are potentially a number of different ways in which we can estimate the synonymous substitution rate under a physicalsites model (see appendix b and discussion). Here we use a simple method. We estimate the rate of synonymous substitution at twofold and fourfold degenerate sites separately. We restrict our analysis to those codons that code for the same amino acid in the two species being considered and we consider only synonymous changes at the third codon position. In restricting our analysis to codons that have no nonsynonymous differences we are assuming that the codon has undergone no amino acid substitution—this is a reasonable assumption given the level of amino acid divergence in the data sets we analyze. We use nucleotidebased methods that take into account the major feature of the codon usage bias in Drosophila and mammals—i.e., the bias toward G and Cending codons. For fourfold degenerate sites we used the method of Tamura (Tamura 1992) to correct for multiple hits; this method allows for unequal GC content and ts/tv bias. We give the rate of synonymous substitution at fourfold the symbol D^{T}_{s4}. For twofold degenerate codons we used Bulmer’s (1991) method, which is a derivative of Tajima and Nei’s (1984) method,
p_{2} is the proportion of twofold sites that show a synonymous difference and f_{2} is the frequency of GC at those sites. In theory we could estimate the rate of substitution for CT and AG twofolds separately, but this is unnecessary because combining them gives accurate estimates (see below). Bulmer’s method corrects for GC content. We estimate the total number of synonymous substitutions per codon, for the codons analyzed, as
The original GY maximumlikelihood estimates of divergences were kindly provided by Katherine Dunn and Joe Bielawski; these were the number of synonymous (
RESULTS
Drosophila: Using the BT methods we find that the rate of synonymous substitution at both twofold and fourfold degenerate codons is positively correlated to ENC for both the D. melanogasterD. pseudoobscura and D. simulansD. yakuba data sets; i.e., the synonymous substitution rate per physical site is negatively correlated to codon usage bias. In contrast, the GY estimate of the synonymous substitution rate is not correlated to codon bias in either data set (Figure 1).
The discrepancy between the methods is not due to problems with the correction for multiple hits because both methods give similar estimates for the number of synonymous substitutions per codon that occur in each gene, if we restrict the analysis to those codons considered by the BT methods presented here (i.e., twofold and fourfold codons with no apparent amino acid substitution; Figure 2). Furthermore, the rate of synonymous substitution per codon is significantly correlated to ENC for both the GY (Figure 3) and BT methods (results not shown). So the correlation between the synonymous substitution rate and codon bias vanishes for the GY method only when the rate is calculated per site; hence the difference between the GY and BT estimates is due to the definition of a site.
The GY method uses the mutationalopportunity definition of a site; however, it takes into account not only the ts/tv ratio but also codon usage bias in its estimate of the number of sites. As a consequence, the proportion of sites that are synonymous (ρ_{s}) is correlated to codon bias (Figure 4)—as codon bias increases (i.e., ENC decreases), so the proportion of sites that are synonymous decreases, which cancels out the decrease in the synonymous substitution rate per codon, to yield a synonymous substitution rate per site that is independent of codon bias.
Mammals: The estimate of the synonymous substitution rate per site is positively correlated to codon bias using both the GY and the BT methods (Figure 5). However, the nature of the relationship is very different—the gradient is much greater for
DISCUSSION
The nature of the relationship between codon usage bias and the synonymous substitution depends upon the definition of a site used to estimate the substitution rate. If a mutationalopportunity definition is used, as encapsulated in the method of Goldman and Yang (1994), then the relationship is absent or positive in Drosophila (Dunnet al. 2001; Betancourt and Presgraves 2002) and strongly positive in mammals (Bielawskiet al. 2000). In contrast, with a physical definition of a site, as implemented in our BT methods, the synonymous substitution is negatively correlated to codon bias in Drosophila, and although the correlation is positive in mammals, the correlation is weaker, in terms of the gradient, than when using the GY method. The differences between the methods are due solely to their definition of a site, not to their ability to correct for multiple hits—this is illustrated by the fact that the two methods give very similar estimates of the number of synonymous substitutions per codon (Figure 2), but different estimates of the number of substitutions per site.
The crucial question is which definition of a site is more informative in the context of substitution rates and codon bias, and which definition is more informative in other contexts—both definitions of a site are “correct” since one can define a site however one wants. We would argue that the mutationalopportunity definition of a site is likely to be misleading in some contexts simply because the definition of site is abstract and likely to depend on many factors that are not immediately obvious. For example, the proportion of sites that are synonymous is dependent upon the level of codon bias (Figures 4 and 6).
The fact that the synonymous substitution rate per codon and per physical site is negatively correlated to codon bias (positively correlated to ENC) in Drosophila suggests that there is a biological phenomenon that needs to be explained, a phenomenon that is either obscured or in the wrong direction when a mutationalopportunity definition is employed. Furthermore, under the physical definition of site, it is relatively easy to develop models to explain the pattern. For example, we might hypothesize that the correlation is generated by directional selection—in the development of such a model a site is most easily defined physically (one could define the site as a mutational opportunity and include this in the model, but this would add complications). Alternatively we might hypothesize that the relationship is generated by a correlation between the mutation rate and gene expression, as appears to be the case in Escherichia coli (Berg and Martelius 1995; EyreWalker and Bulmer 1995).
General considerations: Rates of synonymous and nonsynonymous substitution have been used in many contexts including (i) the estimation of phylogeny, (ii) the estimation of absolute rates of evolution, (iii) the comparison of substitution rates between genes, (iv) the testing of models of evolution, and (v) the investigation of adaptive evolution. Which definition of a site should we use in these different contexts?
It is probably not particularly important whether we define a site as mutational opportunity or a physical site in the reconstruction of phylogeny— the most important quality of our metric is that it reflects evolutionary divergence.
Whether we should use a physical or mutationalopportunity definition of a site to measure absolute rates of substitution depends on what we wish to use our estimate for. Under the assumption that synonymous mutations are neutral, d_{s}, the synonymous substitution rate per site, under the mutationalopportunity definition of site, is the average mutation rate across the three codon positions (Z. Yang, personal communication; see Equation 5), and d_{s}L_{n} is the amino acid mutation rate per gene. Both of these quantities may be useful. However, in other instances the physical definition of site may be more useful—for example, if we wanted to estimate the effective population size of a species, we could estimate nucleotide diversity and the synonymous subsitution rate at fourfold degenerate sites.
As we have shown above, both in the simple model used in the Introduction and in the analysis of the relationship between codon bias and the synonymous substitution rate, the mutationalopportunity definition of a site can give misleading results when genes are compared unless the proportion of sites that are synonymous and nonsynonymous is the same in all the genes in the comparison.
Furthermore, if we are seeking to test a model of evolution, for example, to test whether a correlation between synonymous codon bias and the synonymous substitution rate is due to selection, then we can use either definition of a site by building the definition of a site into the model itself. However, this will generally be much easier for the physical definition of a site.
The one arena in which the definition of a site as a mutational opportunity is clearly superior to the physical definition of site is in the detection of adaptive evolution. Adaptive evolution can be detected in a comparison of the nonsynonymous (d_{n}) and synonymous (d_{s}) substitution rates. Let us assume that synonymous mutations are neutral; then if we can define d_{n} and d_{s} such that d_{n} = d_{s} when all nonsynonymous mutations are neutral, adaptive evolution can be inferred if d_{n} > d_{s}. Estimating substitution rates as the number of substitutions per mutational opportunity is clearly appropriate in this context—if all nonsynonymous mutations are neutral, then the substitution rate per mutation will equal that at synonymous sites (see Equation 5). Inferring the action of adaptive evolution using the physical definition of a site is much more complex. These considerations are summarized in Table 1.
Estimating the rate per physical site: We can estimate the rate of substitution per physical site in a number of different ways (appendix b). We can choose to estimate the substitution rates per codon or per nucleotide site. The former has the advantage that the method yields a single estimate of the synonymous and nonsynonymous substitution rates, but it has the disadvantage that the substitution rate will depend to some extent on the degeneracy of the codons in the gene. This may be important in the estimation of the synonymous substitution rate; if the rate of synonymous substitution is higher at fourfold than at twofold degenerate sites, as we would expect given that all mutations at a fourfold degenerate site are synonymous, then genes with a high proportion of fourfold sites will have higher rates of synonymous substitution per codon than genes with a low number of fourfold sites. This may not be satisfactory. However, this sort of bias is likely to be less important for nonsynonymous substitutions since the majority of mutations in a gene are nonsynonymous and the relative proportion of twofold and fourfold degenerate codons does not greatly affect this.
The alternative to calculating rates per codon is to calculate rates per nucleotide site as we have done in our BT method. The BT method is useful for calculating the rate of synonymous substitution per physical site when codon usage can be easily summarized in terms of base composition. However, this is often not the case—for example, E. coli has strong synonymous codon bias, which is not a simple function of base composition. For data of this sort, it is preferable to use a codonbased model to estimate the number of substitutions and then to express these values per physical site. Z. Yang (personal communication) has recently suggested a measure, d_{4}, which can be derived from the GY method. The method estimates the number of synonymous substitutions that have occurred between fourfold degenerate codons and then divides this by the current number of sites that are physically fourfold degenerate. It would be possible to derive a similar estimate for the rate at twofold degenerate sites. For estimating the rate of nonsynonymous substitution we could estimate rates at zerofold and twofold degenerate sites.
Codon bias and the number of sites: Under the GY method the proportion of sites that are synonymous is correlated to the level of codon usage bias (Figures 4 and 6). This is due to the fact that the GY method takes into account not only the ts/tv ratio but also codon bias itself in calculating the number of sites that are synonymous. The reason codon bias affects the number of sites that are synonymous is as follows. Imagine a gene in which all codons are fourfold degenerate and in which there is strong bias in favor of G and Cending codons. Let us assume for simplicity that this codon bias is mutational in origin (the GY method implicitly assumes this). A strong bias in favor of GC tells us that the mutation rate from AT to GC is stronger than the rate from GC to AT. Since nonsynonymous sites have lower GC content than synonymous sites, because they are subject to functional constraints, they will have a higher mutation rate (because they have more AT sites, which have a high mutation rate). The proportion of mutations that are nonsynonymous will therefore be relatively large, which will be reflected in a large value of L^{GY}_{n} and a small value of L^{GY}_{s}. Genes with high synonymous codon bias therefore have a lower proportion of synonymous sites because a smaller proportion of mutations are synonymous. As with the ts/tv ratio this can lead to anomalous results. Imagine two genes that have the same number of twofold and fourfold sites and the same synonymous codon bias and have undergone exactly the same number of synonymous substitutions. They have the same synonymous substitution rate per physical site, but if their nonsynonymous sites differ in composition, then the estimates of the number of synonymous substitutions per site, under the GY method, will be different because the proportion of mutations, and hence sites, that are synonymous will differ between the genes.
Other issues with the GY method: The synonymous substitution rate estimated by the GY method can be used to detect positive selection at nonsynonymous sites: i.e., adaptive evolution can be inferred when
Other results: The GY method has been used to examine the relationship between the synonymous substitution rate and codon usage bias in three other groups, enteric bacteria (Smith and EyreWalker 2001), conifers (Kusumiet al. 2002), and D. melanogasterD. simulans (Betancourt and Presgraves 2002). In enteric bacteria there is a negative correlation between codon usage bias and the synonymous substitution rate even if the rate is measured using a variation of the TajimaNei method (EyreWalker and Bulmer 1995), so the correlation seems robust. In conifers the correlation remains if the synonymous substitution rate per codon is used instead of the rate per site (data not shown). In D. melanogasterD. simulans there is a negative correlation between the frequency of optimal codons and the synonymous substitution rate per codon, contrary to the results obtained by Betancourt and Presgraves (2002; our reanalysis of their data); the positive correlation they detected was an artifact produced using the GY method.
Conclusions: We have shown that the basic philosophy underlying the counting of sites in many methods for estimating substitution rates (i.e., the mutationalopportunity concept) is inappropriate in some contexts. In particular, it is inappropriate for comparing rates between genes. The GY method encapsulates this basic philosophy better than most other methods since it takes into account both the transition/transversion ratio and synonymous codon bias. Ironically, it is the sophistication of the GY method that has made the problem of counting sites apparent.
APPENDIX A: MUTATIONALOPPORTUNITY METHODS
Here we describe the major methods that are used to estimate rates of synonymous and nonsynonymous substitution.
Nei and Gojobori (1986): Nei and Gojobori suggested two methods, which differ in the way they compute the number of synonymous and nonsynonymous changes between two codons that differ at more than one site. Their method I appears to be the only one used currently. In this method the different pathways between two codons, which differ by more than one codon, are weighted equally. The correction of multiple hits is achieved using the JukesCantor (Jukes and Cantor 1969) model of evolution in which all nucleotide changes are assumed to be equally likely (i.e., ts/tv = 1). The methods assume that a twofold degenerate site is onethird synonymous, which is as one expects under the mutationalopportunity philosophy and the model of nucleotide change that is assumed.
Li et al. (1985), Li (1993), Pamilo and Bianchi (1993): The method of Li et al. (1985) differs from that of Nei and Gojobori (1986) in two respects. First, the method does not assume that pathways between codons, with multiple differences, are equally likely. And second, the correction for multiple hits is achieved using Kimura’s twoparameter method in which transitions can have a different substitution rate to transversions. However, the model assumes that a twofold degenerate site is onethird synonymous. The method is therefore not strictly a mutationalopportunity method because the model of nucleotide change allows transitions and transversions to occur at different rates while the number of sites is calculated assuming that transitions and transversions are equally likely. This discrepancy was removed in a later development of the method (Li 1993). The method of Li (1993) is very similar to the two methods of Pamilo and Bianchi (1993). These methods differ only in the way they treat the different pathways between codons with multiple differences—the method of Li (1993) follows that of Li et al. (1985) and weights pathways according to their likelihood, while the methods of Pamilo and Bianchi (1993) either weight pathways equally or choose the pathways that maximize the number of synonymous relative to nonsynonymous changes. Both methods estimate the nonsynonymous and synonymous substitution rates per site as
In essence the methods are attempting to estimate the rate of substitution at zerofold and fourfold degenerate sites taking into account rates of evolution at twofold degenerate sites. This method is a mutationalopportunity method but this is not obvious. To demonstrate this let us assume that a fraction γ of codons are fourfold degenerate with a fraction (1 γ) being twofold degenerate; for simplicity we assume that there are no threefold and sixfold degenerate codons. As in the simple model above we assume that the transversion rate is x and that the ts/tv ratio is α. Under this model we can write Equations A1 as
Comeron (1995): The method of Comeron is essentially that of Li (1993) and Pamilo and Bianchi (1993) but with one small alteration. The methods of Li (1993) and Pamilo and Bianchi (1993) treat all synonymous changes at twofold sites as transitions whereas some of them are transversions. Comeron (1995) suggests a method to deal with this bias.
Ina (1995): Ina suggests two methods. In each of his methods the ts/tv ratio is estimated and this is used to compute the number of synonymous and nonsynonymous sites—i.e., the method is a mutationalopportunity method. The two methods differ in how the ts/tv ratio is estimated. In the first approximate method the ts/tv ratio estimated at the third codon position is used to calculate the number of sites; however, this will tend to bias the ts/tv ratio upward because some of the third codonposition sites are twofold degenerate. The second method uses an iterative procedure to estimate the ts/tv ratio. Pathways between codons with multiple substitutions are weighted equally.
Goldman and Yang (1994): The method of Goldman and Yang (1994) is somewhat different from those considered so far in that it considers the substitution process between codons, not nucleotides. The rate of substitution between two codons, i and j, is assumed to be
APPENDIX B: PHYSICALSITES APPROACH
Nucleotide site methods: Physicalsite methods can be divided into two categories—those that estimate rates per nucleotide site and those that estimate rates per codon. Methods to estimate rates per nucleotide site have been largely concentrated on estimating the rate of synonymous substitution at fourfold degenerate sites, a measure usually given the symbol K_{4} or d_{4}. The approach taken is the one we have used above—i.e., restricting the analysis to fourfold degenerate sites in codons that have not undergone any apparent amino acid substitution, and then using one of the many nucleotide substitution models to correct for multiple hits: the most widely used models, in order of complexity (i.e., number of parameters), are the models of Jukes and Cantor (1969), Kimura (1980), Tajima and Nei (1984), Hasegawa et al. (1985), Tamura (1992), and Tamura and Nei (1993). See Wolfe et al. (1989), and Bulmer et al. (1991) for examples of this approach. Bulmer (1991) and Bulmer et al. (1991) also suggested a similar method to estimate the synonymous substitution rate at twofold degenerate sites.
Codon methods: We are not aware of any method aimed at estimating the rate of nonsynonymous substitution per physical nucleotide site, but there are physicalsite methods that estimate the rate per codon. In these methods the nonsynonymous, or amino acid, substitution rate per codon is estimated by calculating the proportion of amino acid sites that differ between two sequences, p, and then using a correction for multiple hits. The simplest correction is
Muse and Gaut (1994): One physicalsite method is designed to estimate both the synonymous and nonsynonymous substitution rates per codon. This is the method of Muse and Gaut (1994). They have developed a model that uses the codon, as opposed to the nucleotide, as the unit of evolution. The parameterization of the model of Muse and Gaut (1994) is very similar to the one of Goldman and Yang (1994), with the exception that the ts/tv ratio parameter is removed. The rate of substitution between two codons, i and j, is assumed to be
Acknowledgments
We are very grateful to Andrea Betancourt, Katherine Dunn, Joe Bielawski, Junko Kusumi, and Nick Smith for sharing their data and results and to Nicolas Galtier and Laurence Hurst for useful discussions and comments on an earlier draft. The authors are supported by the Biotechnology and Biological Sciences Research Council and the Royal Society.
Footnotes

Communicating editor: J. Hey
 Received March 3, 2003.
 Accepted July 25, 2003.
 Copyright © 2003 by the Genetics Society of America