Genetics, Vol. 156, 1299-1308, November 2000, Copyright © 2000

Rates of Nucleotide Substitution and Mammalian Nuclear Gene Evolution: Approximate and Maximum-Likelihood Methods Lead to Different Conclusions

Joseph P. Bielawskia, Katherine A. Dunna, and Ziheng Yanga
a Department of Biology, University College London, London NW1 2HE, United Kingdom

Corresponding author: Joseph P. Bielawski, Department of Biology, University College London, 4 Stephenson Way, London NW1 2HE, United Kingdom., j.bielawski{at}ucl.ac.uk (E-mail)

Communicating editor: M. K. UYENOYAMA


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Rates and patterns of synonymous and nonsynonymous substitutions have important implications for the origin and maintenance of mammalian isochores and the effectiveness of selection at synonymous sites. Previous studies of mammalian nuclear genes largely employed approximate methods to estimate rates of nonsynonymous and synonymous substitutions. Because these methods did not account for major features of DNA sequence evolution such as transition/transversion rate bias and unequal codon usage, they might not have produced reliable results. To evaluate the impact of the estimation method, we analyzed a sample of 82 nuclear genes from the mammalian orders Artiodactyla, Primates, and Rodentia using both approximate and maximum-likelihood methods. Maximum-likelihood analysis indicated that synonymous substitution rates were positively correlated with GC content at the third codon positions, but independent of nonsynonymous substitution rates. Approximate methods, however, indicated that synonymous substitution rates were independent of GC content at the third codon positions, but were positively correlated with nonsynonymous rates. Failure to properly account for transition/transversion rate bias and unequal codon usage appears to have caused substantial biases in approximate estimates of substitution rates.


IT is well known that synonymous substitution rates vary among mammalian nuclear genes (BERNARDI et al. 1993 Down; WOLFE and SHARP 1993 Down; MOUCHIROUD et al. 1995 Down). Investigations of this variation, however, are complicated by nonuniform patterns of base composition among different regions of the mammalian genome. Mammalian genomes are structured into large regions (>300 kb) of distinct and homogeneous nucleotide composition known as isochores (BERNARDI 1993 Down). Both natural selection (BERNARDI and BERNARDI 1986 Down; GAUTIER and MOUCHIROUD 1998 Down; EYRE-WALKER 1999 Down) and mutation pressure (FILIPSKI 1988 Down; WOLFE and SHARP 1993 Down; FRANCINO and OCHMAN 1999 Down) have been hypothesized as important mechanisms for the origin and maintenance of isochores. Consequently, the relationship between synonymous rate and nucleotide composition has been the subject of debate (e.g., BERNARDI et al. 1993 Down).

Most studies report that genes with high GC content have lower silent substitution rates than genes with intermediate GC content (FILIPSKI 1988 Down; TICHER and GRAUR 1989 Down; WOLFE et al. 1989 Down; WOLFE and SHARP 1993 Down; EYRE-WALKER 1994 Down). However, others (MIYATA et al. 1982 Down; BERNARDI et al. 1993 Down; MATASSI et al. 1999 Down) concluded that synonymous substitution rates are independent of nucleotide composition. Recently, SMITH and HURST 1999 Down analyzed a large sample of mouse and rat genes and found a significant positive correlation when maximum likelihood (ML) was used and no correlation when approximate methods were used. SMITH and HURST 1999 Down suggested that this methodological bias hindered further investigation of the relationship between synonymous rate variation and GC content.

A number of authors have reported that synonymous and nonsynonymous rates are positively correlated in mammalian genes (GRAUR 1985 Down; LI et al. 1985 Down; WOLFE and SHARP 1993 Down; MOUCHIROUD et al. 1995 Down; OHTA and INA 1995 Down; MAKALOWSKI and BOGUSKI 1998 Down; SMITH and HURST 1999 Down). This observation, taken together with patterns of within-gene rate variation, recently led ALVAREZ-VALIN et al. 1998 Down to hypothesize that selection is acting to enhance translational accuracy in mammals. However, this interpretation of the correlation between synonymous and nonsynonymous substitution rates also is controversial (EYRE-WALKER 1991 Down; SMITH and HURST 1999 Down). SMITH and HURST 1999 Down hypothesized that selection for RNA structure and tandem substitutions, rather than translational accuracy, dominates the evolution of silent sites of rodent genes. Further investigations of selection at synonymous sites will require more reliable estimates of substitution rates.

To date, most studies of mammalian genes have employed approximate methods of estimating substitution rates. Although such studies intended to examine the effect of nucleotide content, their estimation procedures ignored unequal nucleotide frequencies. Most approximate methods also ignored the transition/transversion rate bias. Recent studies suggest that ignoring the transition/transversion rate bias or codon usage bias could lead to systematically biased estimates of substitution rates (INA 1995 Down; YANG and NIELSEN 1998 Down, YANG and NIELSEN 2000 Down). A method that accounts for those features of DNA sequence evolution is ML. By employing a codon model of substitution, the ML method also uses probability theory to correct for multiple hits and weight evolutionary pathways between codons (GOLDMAN and YANG 1994 Down; MUSE and GAUT 1994 Down).

The objective of this study was to evaluate differences between ML and approximate methods and to evaluate their impacts on hypothesis testing. We compiled a sample of 82 homologous genes from three mammalian orders and estimated the rates of synonymous and nonsynonymous substitution for each gene using the ML method and two popular approximate methods (NEI and GOJOBORI 1986 Down; INA 1995 Down). These data were used to evaluate the sensitivity of testing the following two null hypotheses: (i) the rate of synonymous substitution is independent of nucleotide composition, and (ii) the rate of synonymous substitution is independent of the rate of nonsynonymous substitution. ML analysis indicated that synonymous substitution rates were positively correlated with GC content at third codon positions but were independent of the nonsynonymous rate. Approximate methods, however, indicated opposite relationships, i.e., synonymous substitution rates were independent of GC content at third codon positions but were positively correlated with the nonsynonymous rate. The differences were found to be due to the failure of approximate methods to properly account for the transition/transversion rate bias and unequal codon frequencies.


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Sequence data:
We analyzed the aligned sequences of 82 nuclear genes from the mammalian orders Artiodactyla, Primates, and Rodentia. The data set is a composite of 49 genes analyzed by OHTA 1995 Down and 48 genes analyzed by ALVAREZ-VALIN et al. 1998 Down. The total number of genes in our analysis is 82 because 7 genes were used by both studies and because 8 genes were excluded due to regions of ambiguous alignment. Small differences between studies in number of codons analyzed are due to removal of initiation codons and minor adjustments to alignments.

Nucleotide composition and synonymous codon usage:
G + C content at third codon positions (GC3) and codon usage bias, measured using the effective number of codons (ENC; WRIGHT 1990 Down), were calculated for each gene. ENC ranges from 20 to 61 with a smaller value indicating a greater bias. GC3 and ENC were computed using the program Codon W written by John Penden. Tests of compositional homogeneity among mammalian orders were conducted for each gene using chi-square tests of contingency tables of nucleotide counts.

Estimation of the numbers of synonymous (dS) and nonsynonymous (dN) substitutions per site:
ML analysis was performed using the PAML package (YANG 1999 Down). The models account for transition/transversion rate bias ({kappa}) and codon usage bias (see YANG and NIELSEN 1998 Down for details). We used two models to determine equilibrium codon frequencies. The first model used the nucleotide frequencies at the three positions of the codon and had 3 x (4 - 1) = 9 parameters. The second model used empirical estimates of 61 codon frequencies and had 60 parameters. Likelihood ratio tests comparing those two models (d.f. = 60 - 9 = 51) were significant for 81 of the 82 genes (data not shown). Analyses of substitution rates using both models were similar and hence only results obtained using empirical estimates of codon frequencies are presented.

Likelihood ratio tests of the assumption that the nonsynonymous/synonymous rate ratio ({omega} = ) is homogeneous for all three mammalian lineages were performed by comparing two models of dN/dS ratios (YANG and NIELSEN 1998 Down). Model 0 assumed the same ratio ({omega}0) for all three branches of the artiodactyl, primate, and rodent tree, whereas model 1 allowed independent dN/dS ratios ({omega}A, {omega}P, {omega}R) for the three branches. Twice the log-likelihood difference under these two models was compared to a {chi}2 distribution with d.f. = 2. This constitutes a likelihood ratio test of the strict neutral hypothesis. Model 1 also was used to obtain lineage-specific estimates of dS and dN for each gene.

Estimates of dS and dN also were computed pairwise between sequences using the approximate methods of NEI and GOJOBORI 1986 Down and INA 1995 Down. The PAML package (YANG 1999 Down) was used to implement the method of NEI and GOJOBORI 1986 Down and Ina's program (dists1, available at ftp.nig.ac.jp) was used to implement method 1 of INA 1995 Down. To facilitate comparison of approximate and ML methods, we also estimated dS and dN in a pairwise fashion between the three orders of mammals using ML (GOLDMAN and YANG 1994 Down).

ML estimation can be performed under different model assumptions. We thus changed the models to investigate the effects of nucleotide (codon) frequencies and transition/transversion rate bias on estimation of dS and dN. If one compares a model in which {kappa} is fixed to 1 (the rate of transition is set equal to the rate of transversion) to a model without such a constraint, the difference in dS and dN indicates the bias that arises from failure to account for the transition/transversion ratio. Likewise, if one compares a model in which codon frequencies are assumed to be equal (1/61) to a model where codon frequencies are free parameters, the difference in dS and dN indicates the bias that arises from failure to account for unequal codon usage.


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Nucleotide (codon) usage bias and transition/transversion bias are common features of mammalian DNA sequence evolution:
GC content at third codon positions (GC3) varied greatly among genes, ranging from 29 to 96%. Consistent with the suggestion that most mammalian genes are located in GC-rich isochores (BERNARDI 1993 Down), we observed that the majority of genes (60%) were GC rich (GC3 > 60%) at third codon positions. Only a small proportion of genes (5%) were AT rich (AT3 > 60%) at third codon positions. Mean values of GC3 were 65, 62, and 62% for artiodactyl, primate, and rodent genes, respectively.

Consistent with patterns of nucleotide bias, codon usage also varied greatly among genes, with ENC ranging from small values indicating highly biased codon usage (e.g., primate neurophysin 1 = 30.8) to large values indicating unbiased codon usage (e.g., rodent transforming growth factor ß1 = 60.4). Mean values of ENC were 46.8, 47.6, and 49.6 in artiodactyls, primates, and rodents, respectively. ML estimates of the transition/transversion rate ratio, {kappa}, indicated that a transition bias was also present in all the sampled genes (Table 1). Collectively, these data show that transition/transversion bias and biased nucleotide (codon) frequencies are common features of DNA sequence evolution in mammalian genes.


 
View this table:
In this window
In a new window

 
Table 1. Maximum-likelihood estimates of synonymous and nonsynonymous rates

Lineage-specific estimation of substitution rates by maximum likelihood:
Results of ML analyses using model 0 (one dN/dS ratio) and model 1 (lineage-specific dN/dS ratios) are presented in Table 1. Using a likelihood ratio test, homogeneity of dN/dS ratio was rejected for 33 (40%) of the sampled genes (Table 1). Furthermore, there were 6 genes in the primate lineage (CD3 {epsilon} antigen, growth hormone receptor, insulin-like growth factor 1, interleukin 6 receptor, interleukin 7, osteopontin) and one gene in the artiodactyl lineage (interleukin 2) for which dN/dS ratios were >1.0. Because positive selection could adversely affect our investigation (MAKALOWSKI and BOGUSKI 1998 Down), gene and lineage combinations for which the dN/dS ratio was >1 were excluded from further analysis.

Values of dN and dS were estimated separately for the artiodactyl, primate, and rodent lineages using model 1 (Table 1). Estimates of dS for these lineages were positively correlated (artiodactyl vs. primate, r2 = 0.1343, P = 0.0013; artiodactyl vs. rodent, r2 = 0.2993, P < 0.0001; primate vs. rodent, r2 = 0.2632, P < 0.0001). Similarly, estimates of dN were correlated between lineages (artiodactyl vs. primate, r2 = 0.5758, P < 0.0001; artiodactyl vs. rodent, r2 = 0.6401, P < 0.0001; primate vs. rodent, r2 = 0.5763, P < 0.0001). These findings are consistent with previous reports that substitution rates were variable among genes, and genes with higher substitution rates in one lineage tended to have higher rates in other lineages as well (BULMER et al. 1991 Down; MOUCHIROUD et al. 1995 Down).

Hypothesis testing using maximum-likelihood estimates of substitution rates:
The null hypothesis that the rate of synonymous substitution is independent of nucleotide composition was evaluated by linear regression of lineage-specific estimates of dS and GC3. There was a significant positive correlation between dS and GC3, with r2 = 0.45, 0.27, and 0.26 in artiodactyls, primates, and rodents, respectively. Because results were similar for all three lineages, only results for artiodactyl genes are presented in Fig 1.



View larger version (15K):
In this window
In a new window
Download PPT slide
 
Figure 1. The relationship between ML estimates of dS and GC3 in artiodactyl genes.

Because nonstationary genes could have negative impacts on analyses of substitution rates (LANAVE et al. 1984 Down; SACCONE et al. 1989 Down; MOUCHIROUD and GAUTIER 1990 Down), each gene was tested for homogeneity of nucleotide frequencies. Chi-square tests at third positions of the codon indicated significant heterogeneity among lineages in 27 (33%) of the genes (Table 1). Reanalysis of the subset of genes defined by homogeneity of nucleotide frequencies also yielded a significant positive relationship between dS and GC3 (artiodactyls, r2 = 0.5053, P < 0.0001; primates, r2 = 0.2351, P = 0.0004; rodents, r2 = 0.4225, P < 0.0001). This finding indicated that a positive correlation between dS and GC3 was not a consequence of including genes that were nonstationary for nucleotide frequencies.

The null hypothesis that synonymous and nonsynonymous substitution rates are independent was evaluated by linear regression of lineage-specific estimates of dS and dN. In the artiodactyl and rodent lineages, the correlation between dS and dN did not differ significantly from zero (Fig 2, a and b). Primate genes, however, exhibited a significant positive correlation between dS and dN (Fig 2C). This plot has an outlier gene (growth hormone), and MAKALOWSKI and BOGUSKI 1998 Down demonstrated that outliers could have adverse effects on linear regression of dS and dN. When growth hormone was removed, the correlation between dS and dN did not differ significantly from zero (Fig 2D). Reanalysis of artiodactyl and rodent lineages to the exclusion of other outlier genes had no effect on the inferred relationship between dS and dN (data not shown). Given that previous analyses of growth hormone indicated episodes of positive selection (OHTA 1993 Down; WALLIS 1996 Down), we excluded it from further analyses.



View larger version (23K):
In this window
In a new window
Download PPT slide
 
Figure 2. The relationship between ML estimates of dS and dN in artiodactyl (a), rodent (b), and primate (c and d), genes. G.H. indicates the growth hormone gene.

The null hypothesis that synonymous and nonsynonymous substitution rates are independent was retested by using dS and dN estimated from the subset of genes defined by homogeneous dN/dS ratios. None of the comparisons exhibited a significant correlation (artiodactyls, r2 = 0.0297, P = 0.2367; primates, r2 = 0.0304, P = 0.2413; rodents, r2 = 0.0025, P = 0.7318). Similar results also were obtained from reanalysis of the subset of genes defined by stationary nucleotide frequencies (artiodactyls, r2 = 0.0003, P = 0.9074; primates, r2 = 0.0284, P = 0.2525; rodents, r2 = 0.0013, P = 0.7919). These results indicate that lack of a correlation between dS and dN was not a consequence of including genes with nonstationary nucleotide frequencies or with variable dN/dS ratios among lineages.

Hypothesis testing using approximate estimates of substitution rates:
The two null hypotheses were tested using two approximate methods (NEI and GOJOBORI 1986 Down; INA 1995 Down). Consistent with some previous analyses that used approximate methods (MIYATA et al. 1982 Down; BERNARDI et al. 1993 Down; MATASSI et al. 1999 Down; SMITH and HURST 1999 Down), the correlation between dS estimated between a pair of lineages and the average GC3 between the same pair of lineages did not differ significantly from zero. Also consistent with previous analyses based on approximate methods (GRAUR 1985 Down; LI et al. 1985 Down; WOLFE and SHARP 1993 Down; MOUCHIROUD et al. 1995 Down; OHTA and INA 1995 Down; MAKALOWSKI and BOGUSKI 1998 Down; SMITH and HURST 1999 Down), there was a significant positive correlation between dS and dN. Because results were similar for all three comparisons, only comparisons between artiodactyl and primate genes are presented in Fig 3. These findings indicate that approximate and ML methods led to exactly opposite conclusions.



View larger version (24K):
In this window
In a new window
Download PPT slide
 
Figure 3. The relationship between pairwise estimates of dS and mean GC3 (a–c) and the relationship between pairwise estimates of dS and dN (d–f). All plots represent a pairwise comparison between artiodactyl and primate genes. Pairwise estimates of substitution rates were computed by using the approximate methods of NEI and GOJOBORI 1986 Down and INA 1995 Down and also by using ML (GOLDMAN and YANG 1994 Down).

Pairwise estimation of dS and dN using maximum likelihood is consistent with lineage-specific estimation of dS and dN:
Approximate methods are applicable only to pairwise sequence comparisons, whereas ML results discussed above were obtained from joint analysis of all sequences on a phylogeny. To facilitate direct comparison of approximate and ML methods, dS and dN were re-estimated in a pairwise fashion between the sampled lineages using ML (GOLDMAN and YANG 1994 Down). In all three pairwise comparisons, estimation of substitution rates via ML yielded results similar to those obtained by using lineage-specific estimates of substitution rates; i.e., a significant positive correlation was observed between dS and GC3, and a nonsignificant correlation was observed between dN and dS (Fig 3C and Fig F). These findings indicate that comparisons could be made between approximate and ML methods by utilizing ML to estimate dN and dS in a pairwise fashion between lineages.

Reconciling differences between methods:
We have shown that transition/transversion bias is a common feature of DNA sequence evolution in these genes. The approximate method of NEI and GOJOBORI 1986 Down ignores the transition/transversion bias by assuming rate equality. We changed the parameters of the codon model to investigate the effects of this assumption on the estimation of dS and dN (see MATERIALS AND METHODS). The effect of ignoring the transition/transversion rate bias was consistent underestimation of the numbers of synonymous sites (S; Fig 4A). Because transitions at third codon positions are more likely to be synonymous than transversions, ignoring the transition/transversion bias leads to underestimation of S and overestimation of dS (LI et al. 1985 Down; PAMILO and BIANCHI 1993 Down; INA 1995 Down; YANG and NIELSEN 1998 Down).



View larger version (18K):
In this window
In a new window
Download PPT slide
 
Figure 4. Bias in the estimated number of synonymous sites (S) when (a) transition/transversion ratio ({kappa}) is ignored and (b) when unequal codon frequencies are ignored. Data presented in (a) were estimated using two models with equal codon frequencies (1/61), and in one model {kappa} was a free parameter and in the other model {kappa} = 1 (transition and transversion rates assumed to be equal). Data presented in (b) were estimated using two models with {kappa} = 1, where one model used empirical codon frequencies and the other model assumed equal codon frequencies (1/61).

We also have shown that biased nucleotide (codon) frequencies were characteristic of the sampled genes. Both the methods of NEI and GOJOBORI 1986 Down and INA 1995 Down ignore this feature of DNA sequence evolution. We changed the parameters of the codon model to investigate the effect of this assumption on estimation of dS and dN (see MATERIALS AND METHODS). Ignoring codon bias had the opposite effect to ignoring the transition/transversion bias, in that S was consistently overestimated (Fig 4B). These results indicate that the number of synonymous sites (S) available to mutation was restricted to varying degrees by biased codon usage. Because approximate methods (NEI and GOJOBORI 1986 Down; INA 1995 Down) assume unbiased codon usage, counts of the number of synonymous substitutions will be measured against too large a number of synonymous sites, and therefore dS will be underestimated. Because the total number of sites is fixed in a gene, the bias in estimation of nonsynonymous sites (N) is opposite to that of S.

To understand why different methods produced different results concerning the correlation of dS with GC3 or dN, we examined the following two summary statistics: (i) the ratio of the approximate estimate of dS to the ML estimate of dS (dS ratio) and (ii) the ratio of the approximate estimate of dN to the ML estimate of dN (dN ratio). Plots of the dS ratio and dN ratio against GC3 illustrate the complexity of the biases involved in approximate estimation of dS and dN (Fig 5). For genes with highly biased nucleotide (codon) usage (GC3 > 60%), both approximate methods were consistent with our earlier analysis of codon models that ignored nucleotide (codon) frequencies (Fig 4B) in that dS was underestimated and dN was overestimated (Fig 5). However, when nucleotide (codon) bias was weak (GC3 < 60%), the two approximate methods differed in the direction of bias, with the method of NEI and GOJOBORI 1986 Down overestimating dS and underestimating dN (Fig 5A) and the method of INA 1995 Down underestimating dS and overestimating dN (Fig 5B).



View larger version (26K):
In this window
In a new window
Download PPT slide
 
Figure 5. Bias in (a) the method of NEI and GOJOBORI 1986 Down and (b) the method of INA 1995 Down as compared to ML. Data represent pairwise comparisons between artiodactyl and primate genes. Bias was measured using the ratio of the approximate estimate of dS to the ML estimate of dS (dS ratio) and the ratio of the approximate estimate of dN to the ML estimate of dN (dN ratio). "NG" indicates the method of NEI and GOJOBORI 1986 Down, "Ina" indicates method 1 of INA 1995 Down, and "ML" indicates the maximum-likelihood method of GOLDMAN and YANG 1994 Down.

Estimates of dS and dN by the method of NEI and GOJOBORI 1986 Down were affected differently in genes with high and low codon bias (Fig 5A) because this method ignores both the transition/transversion rate bias and codon usage bias, and these two features of DNA sequence evolution have opposite effects on estimation of dS and dN (Fig 4). The method of INA 1995 Down overestimated dS and underestimated dN in genes with both weak as well as strong codon usage bias because this method overcorrects for the transition/transversion rate bias (YANG and NIELSEN 1998 Down), thereby producing bias in the same direction as when codon usage is highly biased. For both methods, codon usage bias had the largest effect on approximate estimation of dS and dN (Fig 5).

To understand the difference between methods concerning the dS and dN correlation, we examined the relationship between dS ratios and ML estimates of dN and the relationship between dN ratios and ML estimates of dS. Although approximate methods produced highly biased estimates of dS (Fig 5), there was no significant correlation between this bias (dS ratio) and dN (e.g., artiodactyl vs. primate: NEI and GOJOBORI 1986 Down, r2 = 0.025, P = 0.1539; INA 1995 Down, r2 = 0.017, P = 0.2505). However, there was a significant positive correlation between the dN ratio and dS (e.g., artiodactyl vs. primate: NEI and GOJOBORI 1986 Down, r2 = 0.227, P < 0.0001; INA 1995 Down, r2 = 0.258, P < 0.0001). These findings suggest that approximate estimation of dN could interpose a positive correlation between estimates of nonsynonymous and synonymous substitution rates.

The preceding analyses suggested that failure of the approximate methods to properly account for the transition/transversion rate bias and unequal codon usage has resulted in seriously biased estimates of substitution rates. These biases appear to be the source of conflict between the methods. To test this prediction, we retested the two null hypotheses using substitution rates estimated from a codon model that was modified to ignore biased nucleotide (codon) frequencies and transition/transversion ratio. Linear regression of substitution rates estimated using this codon model yielded results that fit the prediction, i.e., there was no significant correlation between dS and GC3 (e.g., artiodactyl vs. primate: r2 = 0.027, P = 0.137), and there was a significant positive correlation between dS and dN (e.g., artiodactyl vs. primate: r2 = 0.124; P = 0.001).


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Synonymous substitution rate is positively correlated with nucleotide composition:
Mammalian genomes exhibit a degree of structure in the form of long (>300 kb) compositionally homogenous regions of DNA known as isochores (BERNARDI 1993 Down). The well-known correlation between GC content at third codon positions of a gene and GC content of the isochore in which that gene resides, permits us to study substitution rates at the level of the isochore (MOUCHIROUD et al. 1991 Down; BERNARDI 1995 Down; CLAY et al. 1996 Down). Our results indicate that synonymous substitution rates differ among isochores and therefore among different regions of the mammalian genome. Furthermore, the most GC-rich isochores appear to have the highest synonymous substitution rate. These results are significant because arguments against a mutation-based hypothesis for the origin and maintenance of isochores have relied, in part, upon the assumption that synonymous substitution rates do not differ among regions of the mammalian genome (BERNARDI et al. 1993 Down; MOUCHIROUD et al. 1995 Down).

The hypothesis that synonymous substitution rates vary among different isochores was originally proposed by WOLFE et al. 1989 Down. Moreover, WOLFE et al. 1989 Down found remarkably similar rates of silent substitution in six physically linked genes in mouse and rat. Support for the hypothesis of WOLFE et al. 1989 Down can be found in other studies. MATASSI et al. 1999 Down investigated synonymous substitution rates among genes lying within one centimorgan of each other in mouse and human. Synonymous substitution rates among these neighboring genes were more similar than among genes that were farther apart on the chromosome (MATASSI et al. 1999 Down). The results of our study, taken together with those of WOLFE et al. 1989 Down and MATASSI et al. 1999 Down, suggest that the perceived gene specificity of synonymous substitution rate reflects, at least in part, region-specific effects on the rate of synonymous substitution.

MATASSI et al. 1999 Down also investigated GC3 content of genes within one centimorgan of each other and found that the same sets of neighboring genes were more similar to each other in GC content than genes found farther apart on the chromosome. However, in contrast to our study, MATASSI et al. 1999 Down did not find a significant correlation between dS and GC3 and hypothesized that regional similarities in both synonymous substitution rates and nucleotide composition were evolving independently of each other. Values of dS used in their correlation analysis were estimated using the approximate method of LI 1993 Down. Because this method is similar to the method of INA 1995 Down in that it does not account for biased nucleotide (codon) frequencies, their estimates might be biased.

Our results have important implications for the hypothesis of ALVAREZ-VALIN et al. 1998 Down that selection is acting to enhance translational accuracy in mammals. If selection is acting to enhance translational accuracy, then we should observe a negative correlation between nucleotide (codon) bias and synonymous substitution rate (AKASHI 1994 Down). Our finding of a positive correlation between dS and GC3 suggests that synonymous codon usage in mammals is not subject to this type of selective constraint. In support of AKASHI 1994 Down, a negative correlation between synonymous substitution rate and codon bias has been observed in Drosophila, bacteria, and yeast (SHARP and LI 1987 Down, SHARP and LI 1989 Down; SHIELDS et al. 1988 Down; MORIYAMA and GOJOBORI 1992 Down; POWELL and MORIYAMA 1997 Down), and in these taxa codon usage also matches tRNA abundance.

The results of this study do not preclude a role for selection in the maintenance of mammalian isochores. It has been suggested that selection might be acting regionally to elevate GC content (BERNARDI et al. 1985 Down, BERNARDI et al. 1988 Down). In this hypothesis, selection acts to elevate GC content in regions of the genomes of warm-blooded vertebrates as a means of protecting DNA from heat degradation (BERNARDI et al. 1985 Down, BERNARDI et al. 1988 Down). In support of this hypothesis, EYRE-WALKER 1999 Down reported that patterns of silent site variation in major histocompatibility genes of mammals were not consistent with neutral expectations, but were consistent with the influence of selection on nucleotide composition. However, FRANCINO and OCHMAN 1999 Down recently reported that interspecific variation in two globin pseudogenes that reside in different isochores was consistent with the effect of differential GC mutation pressure. Although data presented here are not sufficient to resolve this long-standing controversy, our conclusion that synonymous substitution rates vary among different isochores, taken together with the recent findings of FRANCINO and OCHMAN 1999 Down, suggest at least a partial role for mutation in the maintenance of mammalian isochores.

Synonymous substitution rate is independent of nonsynonymous substitution rate:
SMITH and HURST 1999 Down estimated substitution rates between pairs of rat and mouse genes and found that the correlation between dS and dN obtained from ML was less than neutral expectations (OHTA and INA 1995 Down), whereas the correlation obtained from approximate methods was greater than neutral expectations. In this regard the results of their study are compatible with ours. However, the findings of SMITH and HURST 1999 Down differ from ours in that a positive correlation between ML estimates of dS and dN, although less than neutral expectations, was significant. The reason for this difference is unclear.

A potential source of correlation between dS and dN is variation among loci in codon usage and base frequencies. The significant correlation between dS and dN indicated by the approximate methods, which ignore codon usage bias, disappeared after we corrected for codon usage and base frequencies. Results of simulation studies (YANG and NIELSEN 1998 Down, YANG and NIELSEN 2000 Down) support the view that the differences among methods observed in the present study may be attributed to biases in estimation.

What is clear from both this study and the study of SMITH and HURST 1999 Down is the sensitivity of such analyses to the estimation method and to assumptions concerning the transition/transversion rate bias and nonrandom codon usage. Unbiased estimation of substitution rates is a critical aspect of reliably measuring the effectiveness of selection at synonymous sites.


*  ACKNOWLEDGMENTS

This study was supported by a Biotechnology and Biological Sciences Research Council grant (31/G10434) to Z.Y.

Manuscript received December 9, 1999; Accepted for publication July 20, 2000.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

AKASHI, H., 1994  Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics 136:927-935[Abstract].

ALVAREZ-VALIN, F., K. JABBARI, and G. BERNARDI, 1998  Synonymous and nonsynonymous substitutions in mammalian genes: intragenic correlations. J. Mol. Evol. 46:37-44[Medline].

BERNARDI, G., 1993  The vertebrate genome: isochores and evolution. Mol. Biol. Evol. 10:186-204[Abstract].

BERNARDI, G., 1995  The human genome: organization and evolutionary history. Annu. Rev. Genet. 29:445-476[Medline].

BERNARDI, G. and G. BERNARDI, 1986  Compositional constraints and genome evolution. J. Mol. Evol. 24:1-11[Medline].

BERNARDI, G., B. OLOFFSON, J. FILIPSKI, M. ZERIAL, and J. SALINAS et al., 1985  The mosaic genome of warm-blooded vertebrates. Science 228:953-958[Abstract/Free Full Text].

BERNARDI, G., D. MOUCHIROUD, C. GAUTIER, and G. BERNARDI, 1988  Compositional patterns in vertebrate genomes: conservation and change in evolution. J. Mol. Evol. 28:7-18[Medline].

BERNARDI, G., D. MOUCHIROUD, and C. GAUTIER, 1993  Silent substitutions in mammalian genomes and their evolutionary implications. J. Mol. Evol. 37:583-589[Medline].

BULMER, M., K. H. WOLFE, and P. M. SHARP, 1991  Synonymous nucleotide substitution rates in mammalian genes: implications for the molecular clock and the relationships of mammalian orders. Proc. Natl. Acad. Sci. USA 88:5974-5978[Abstract/Free Full Text].

CLAY, O., S. CACCIÖ, S. ZOUBAK, D. MOUCHIROUD, and G. BERNARDI, 1996  Human coding and noncoding DNA: compositional correlations. Mol. Phylogenet. Evol. 5:2-12[Medline].

EYRE-WALKER, A., 1991  An analysis of codon usage in mammals: selection or mutation bias. J. Mol. Evol. 33:442-449[Medline].

EYRE-WALKER, A., 1994  DNA mismatch repair and synonymous codon evolution in mammals. Mol. Biol. Evol. 11:88-98[Abstract].

EYRE-WALKER, A., 1999  Evidence for selection on silent site base composition in mammals: potential implications for the evolution of isochores and junk DNA. Genetics 152:657-683.

FILIPSKI, J., 1988  Why the rate of silent codon substitution is variable within a vertebrate's genome. J. Theor. Biol. 134:159-164[Medline].

FRANCINO, M. P. and H. OCHMAN, 1999  Isochores result from mutation not selection. Nature 400:30-31[Medline].

GAUTIER, N. and D. MOUCHIROUD, 1998  Isochore evolution in mammals: a human-like ancestral sequence. Genetics 150:1577-1584[Abstract/Free Full Text].

GOLDMAN, N. and Z. YANG, 1994  A codon based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11:725-736[Abstract].

GRAUR, D., 1985  Amino acid composition and the evolutionary rates of protein coding genes. J. Mol. Evol. 22:53-62[Medline].

INA, Y., 1995  New methods for estimating the numbers of synonymous and nonsynonymous substitutions. J. Mol. Evol. 40:190-226[Medline].

LANAVE, C., G. PREPARATA, C. SACCONE, and G. SERIO, 1984  A new method for calculating evolutionary substitution rates. J. Mol. Evol. 20:86-93[Medline].

LI, W.-H., 1993  Unbiased estimation of the rates of synonymous and nonsynonymous substitution. J. Mol. Evol. 36:96-99[Medline].

LI, W.-H., C.-I. WU, and C.-C. LUO, 1985  A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Mol. Biol. Evol. 2:150-174[Abstract].

MAKALOWSKI, W. and M. S. BOGUSKI, 1998  Synonymous and nonsynonymous substitution distances are correlated in mouse and rat genes. J. Mol. Evol. 47:119-121[Medline].

MATASSI, G., P. M. SHARP, and C. GAUTIER, 1999  Chromosomal location effects on gene sequence evolution in mammals. Curr. Biol. 9:786-791[Medline].

MIYATA, T., H. HAYASHIDA, R. KIKUNO, M. HASAGAWA, and M. KOBAYASHI et al., 1982  Molecular clock of silent substitution: at least six fold preponderance of silent changes in mitochondrial genes over those of nuclear genes. J. Mol. Evol. 19:28-35[Medline].

MORIYAMA, E. N. and T. GOJOBORI, 1992  Rates of synonymous substitution and base composition of nuclear genes in Drosophila. Genetics 143:847-858.

MOUCHIROUD, D. and C. GAUTIER, 1990  Codon usage changes and sequence dissimilarity between human and rat. J. Mol. Evol. 31:81-91[Medline].

MOUCHIROUD, D., G. D'ONOFRIO, G. AISSANI, G. MACAYA, and C. GAUTIER et al., 1991  The distribution of genes in the human genome. Gene 100:181-187[Medline].

MOUCHIROUD, D., C. GAUTIER, and G. BERNARDI, 1995  Frequencies of synonymous substitutions in mammals are gene-specific and correlated with frequencies of nonsynonymous substitutions. J. Mol. Evol. 40:107-113[Medline].

MUSE, S. V. and B. S. GAUT, 1994  A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol. Biol. Evol. 11:715-724[Abstract].

NEI, M. and T. GOJOBORI, 1986  Simple methods for estimating the number of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418-426[Abstract].

OHTA, T., 1993  Pattern of nucleotide substitution in growth hormone-prolactin gene family: a paradigm for evolution by gene duplication. Genetics 134:1271-1276[Abstract].

OHTA, T., 1995  Synonymous and nonsynonymous substitutions in mammalian genes and the nearly neutral theory. J. Mol. Evol. 40:56-63[Medline].

OHTA, T. and Y. INA, 1995  Variation in synonymous substitution rates among mammalian genes and the correlation between synonymous and nonsynonymous divergences. J. Mol. Evol. 41:717-720[Medline].

PAMILO, P. and N. O. BIANCHI, 1993  Evolution of the Zfx and Zfy genes—rates and interdependence between genes. Mol. Biol. Evol. 10:271-281[Abstract].

POWELL, J. R. and E. N. MORIYAMA, 1997  Evolution of codon usage bias in Drosophila. Proc. Natl. Acad. Sci. USA 94:7784-7790[Abstract/Free Full Text].

SACCONE, C., G. PESOLE, and G. PREPARATA, 1989  DNA microenvironments and the molecular clock. J. Mol. Evol. 29:407-411[Medline].

SHARP, P. M. and W.-H. LI, 1987  The rate of synonymous substitution in enterobacterial genes is inversely related to codon usage bias. Mol. Biol. Evol. 4:222-230[Abstract].

SHARP, P. M. and W.-H. LI, 1989  On the rate of DNA sequence evolution in Drosophila. J. Mol. Evol. 28:3398-3402.

SHIELDS, D. C., P. M. SHARP, D. G. HIGGINS, and F. WRIGHT, 1988  "Silent" sites in Drosophila genes are not neutral: evidence for selection among synonymous codons. Mol. Biol. Evol. 5:704-716[Abstract].

SMITH, N. G. C. and L. D. HURST, 1999  The effect of tandem substitutions on the correlation of synonymous and nonsynonymous rates in rodents. Genetics 153:1395-1402[Abstract/Free Full Text].

TICHER, A. and D. GRAUR, 1989  Nucleic acid composition, codon usage, and the rate of synonymous substitution in protein-coding genes. J. Mol. Evol. 28:286-298[Medline].

WALLIS, M., 1996  The molecular evolution of vertebrate growth hormones: a pattern of near-stasis interrupted by sustained bursts of rapid change. J. Mol. Evol. 43:93-100[Medline].

WOLFE, K. H. and P. M. SHARP, 1993  Mammalian gene evolution: nucleotide sequence divergence between mouse and rat. J. Mol. Evol. 37:441-456[Medline].

WOLFE, K. H., P. M. SHARP, and W.-H. LI, 1989  Mutation rates differ among regions of the mammalian genome. Nature 337:283-285[Medline].

WRIGHT, F., 1990  The `effective number of codons' used in a gene. Gene 87:23-29[Medline].

YANG, Z., 1999 Phylogenetic Analysis by Maximum Likelihood (PAML), Version 2. University College London, England.

YANG, Z. and R. NIELSEN, 1998  Synonymous and nonsynonymous rate variation in nuclear genes of mammals. J. Mol. Evol. 46:409-418[Medline].

YANG, Z. and R. NIELSEN, 2000  Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol. Evol. 17:32-43[Abstract/Free Full Text].




This article has been cited by other articles:


Home page
Mol Biol EvolHome page
A. M. Andres, C. de Hemptinne, and J. Bertranpetit
Heterogeneous Rate of Protein Evolution in Serotonin Genes
Mol. Biol. Evol., December 1, 2007; 24(12): 2707 - 2715.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
B. Hurle, W. Swanson, NISC Comparative Sequencing Program, and E. D. Green
Comparative sequence analyses reveal rapid and divergent evolutionary changes of the WFDC locus in the primate lineage
Genome Res., March 1, 2007; 17(3): 276 - 286.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
N. Backstrom, H. Ceplitis, S. Berlin, and H. Ellegren
Gene Conversion Drives the Evolution of HINTW, an Ampliconic Gene on the Female-Specific Avian W Chromosome
Mol. Biol. Evol., October 1, 2005; 22(10): 1992 - 1999.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
Q. Wu
Comparative Genomics and Diversifying Selection of the Clustered Vertebrate Protocadherin Genes
Genetics, April 1, 2005; 169(4): 2179 - 2188.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
E. Axelsson, M. T. Webster, N. G.C. Smith, D. W. Burt, and H. Ellegren
Comparison of the chicken and turkey genomes reveals a higher rate of nucleotide divergence on microchromosomes than macrochromosomes
Genome Res., January 1, 2005; 15(1): 120 - 125.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
G. Kudla, A. Helwak, and L. Lipinski
Gene Conversion and GC-Content Evolution in Mammalian Hsp70
Mol. Biol. Evol., July 1, 2004; 21(7): 1438 - 1444.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
M. J. Lercher, J.-V. Chamary, and L. D. Hurst
Genomic Regionality in Rates of Evolution Is Not Explained by Clustering of Genes of Comparable Expression Profile
Genome Res., June 1, 2004; 14(6): 1002 - 1013.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
H. Johannesson, P. Vidal, J. Guarro, R. A. Herr, G. T. Cole, and J. W. Taylor
Positive Directional Selection in the Proline-Rich Antigen (PRA) Gene Among the Human Pathogenic Fungi Coccidioides immitis, C. posadasii and Their Closest Relatives
Mol. Biol. Evol., June 1, 2004; 21(6): 1134 - 1145.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
D. A. Filatov
A Gradient of Silent Substitution Rate in the Human Pseudoautosomal Region
Mol. Biol. Evol., February 1, 2004; 21(2): 410 - 417.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
N. Bierne and A. Eyre-Walker
The Problem of Counting Sites in the Estimation of the Synonymous and Nonsynonymous Substitution Rates: Implications for the Correlation Between the Synonymous Substitution Rate and Codon Usage Bias
Genetics, November 1, 2003; 165(3): 1587 - 1597.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol Evol