Genetics, Vol. 153, 1395-1402, November 1999, Copyright © 1999

The Effect of Tandem Substitutions on the Correlation Between Synonymous and Nonsynonymous Rates in Rodents

Nick G. C. Smitha and Laurence D. Hursta
a Department of Biology and Biochemistry, University of Bath, Bath B42 7AY, United Kingdom

Corresponding author: Nick G. C. Smith, School of Biological Sciences, University of Sussex, Brighton BN1 9QG, United Kingdom., n.g.c.smith{at}sussex.ac.uk (E-mail)

Communicating editor: G. B. GOLDING


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Nonsynonymous substitutions in DNA cause amino acid substitutions while synonymous substitutions in DNA leave amino acids unchanged. The cause of the correlation between the substitution rates at nonsynonymous (KA) and synonymous (KS) sites in mammals is a contentious issue, and one that impacts on many aspects of molecular evolution. Here we use a large set of orthologous mammalian genes to investigate the causes of the KA-KS correlation in rodents. The strength of the KA-KS correlation exceeds the neutral theory expectation when substitution rates are estimated using algorithmic methods, but not when substitution rates are estimated by maximum likelihood. Irrespective of this methodological uncertainty the strength of the KA-KS correlation appears mostly due to tandem substitutions, an excess of which is generated by substitutional nonindependence. Doublet mutations cannot explain the excess of tandem synonymous-nonsynonymous substitutions, and substitution patterns indicate that selection on silent sites is the likely cause. We find no evidence for selection on codon usage. The nature of the relationship between synonymous divergence and base composition is unclear because we find a significant correlation if we use maximum-likelihood methods but not if we use algorithmic methods. Finally, we find that KS is reduced at the start of genes, which suggests that selection for RNA structure may affect silent sites in mammalian protein-coding genes.


THE nature of the relationship between nonsynonymous and synonymous substitution rates pertains to many aspects of molecular evolution in mammals (INA 1996B Down). A link between the processes of evolution at synonymous and nonsynonymous sites may be due to selection on synonymous sites (see below). Selection on silent sites would affect the selectionist-neutralist debate, for example, providing a potential explanation for the overdispersion of synonymous substitution rates (as shown by OHTA 1995 Down), and would call into question the practice of using silent site comparison to study the evolution of mutation rates (as in MCVEAN and HURST 1997A Down).

Several studies have reported a highly significant positive correlation between the synonymous substitution rates (KS) and the nonsynonymous substitution rates (KA) of mammalian genes (WOLFE and SHARP 1993 Down; MOUCHIROUD et al. 1995 Down; MAKALOWSKI and BOGUSKI 1998B Down). The KA-KS correlation also appears to hold within some mammalian genes (ALVAREZ-VALIN et al. 1998 Down).

In this article we investigate a variety of explanations for the intergenic KA-KS correlation in mammals, specifically in the comparison between mouse and rat. A number of hypotheses for the KA-KS correlation exist (for example see LI 1997 Down). An attractive null hypothesis for the KA-KS correlation is the neutral theory explanation, which supposes that genes differ in mutation rates, that all synonymous changes are neutral, and that a variable proportion of nonsynonymous changes are neutral (OHTA and INA 1995 Down). This null hypothesis can explain the existence of a KA-KS correlation in mammals but appears unable to explain why the correlation is so strong (OHTA and INA 1995 Down). We confirm this result using an improved data set and algorithmic rate estimation methods, but we also show that the KA-KS correlation is consistent with the neutral prediction if one uses maximum likelihood (ML) to estimate substitution rates (see RESULTS). Thus methodological bias may have led to previous overestimates of the strength of the KA-KS correlation.

Despite the fact that the strength of the KA-KS correlation may be consistent with silent site neutrality, patterns of substitutions indicate that selection may well be acting on silent sites. In particular, the strength of the KA-KS correlation appears in large part to be due to an excess of tandem substitutions caused by substitutional nonindependence.

Synergy between synonymous and nonsynonymous substitutions, such that one type of substitution increases the likelihood of the other, would increase the KA-KS correlation. Such substitutional nonindependence could be the result of either selection or mutation. Purifying selection might act on both nonsynonymous and synonymous sites (INA 1996A Down), or nonsynonymous substitutions might cause positive selection on subsequent synonymous substitutions (LIPMAN and WILBUR 1985 Down). Alternatively, a single mutational event might affect both synonymous and nonsynonymous sites simultaneously as with doublet mutations (WOLFE and SHARP 1993 Down). (Note on terminology: we use "doublet" to refer to a supposed mutational event affecting adjacent bases and "tandem" to apply to observed adjacent substitutions.)

It is also possible to envisage a hybrid selection-mutation model in which a correlation between the mutation rate and nonsynonymous constraints causes an increase in the KA-KS correlation (INA 1996A Down). Such a hybrid explanation is supported by theoretical (KONDRASHOV 1995 Down) and empirical (MCVEAN and HURST 1997B Down; SMITH and HURST 1999 Down) studies of the evolution of mutation rates.


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Selection of protein coding sequences:
A list of 470 genes in mouse, rat, and human, with orthology confirmed using HOVERGEN 19 (DURET et al. 1994 Down), was obtained from MAKALOWSKI and BOGUSKI 1998A Down. Only genes with complete protein-coding sequence available in a single GenBank/EMBL record were used, leaving 432 three-species comparisons.

Preparation of alignments:
Alignments were performed using the GCG (1994) and EGCG (RICE 1997 Down) packages at HGMP (http://www.hgmp.mrc.ac.uk/). FETCH was used to extract sequences from databases, and GENETRANS was used to extract and combine exons automatically. Protein alignments were performed using CLUSTALW (THOMPSON et al. 1994 Down). Then the DNA alignments were recreated from the protein alignments and the original DNA sequences using the program MRTRANS (written by W. Pearson and available at HGMP).

ML analysis:
The ML package PAML (YANG 1997 Down) was used to reconstruct ancestral sequences and to estimate substitution rates. We used the program BASEML to reconstruct ancestral sites, with the gene tree defined as [(mouse, rat), human], with no rate variation between sites, and with the REV model of evolution. Ancestral sequence reconstruction was carried out by ML, rather then parsimony, for two reasons: ML allows the reconstruction of all sites, and parsimony is biased when base composition is skewed (EYRE-WALKER 1998 Down).

The program CODEML, under a codon-based model of evolution (GOLDMAN and YANG 1994 Down), was used to estimate KA and KS. Using PAML version 2.0 the following parameter settings were used: seqtype = 1, codon-based model; runmode = -2, estimate KA and KS rates; CodonFreq = 3, codon frequencies used as free parameters; additionally, no rate variation was allowed.

Algorithmic rate estimation:
Substitution rates were also estimated from sequence alignments using algorithmic methods developed by MORIYAMA and POWELL 1997 Down. TAMURA 1992 Down multiple hits correction method was used in conjunction with LI 1993 Down method to calculate KA and KS. The substitution rates at fourfold synonymous sites, K4, were also estimated using the algorithmic method of TAMURA and NEI 1993 Down. Estimates of K4 are expected to be more reliable than estimates of KS, which have to combine the rates of sites of different degeneracies.

With regard to the differences between the algorithmic and PAML rate estimation methods, the algorithmic methods gave similar results to PAML using CodonFreq = 1, codon frequencies calculated from average nucleotide frequencies. But with PAML using CodonFreq = 2, codon frequencies calculated from average nucleotide frequencies at the three codon positions, and PAML using CodonFreq = 3, codon frequencies as free parameters, the PAML and algorithmic estimates differed with regard to the strengths of the KA-KS and the KS-composition correlations (data not shown, but see RESULTS for a comparison of the algorithmic estimates and PAML estimates using CodonFreq = 3).

Measurement of substitutional nonindependence:
To analyze lineage-specific substitution patterns, we used mouse, rat, and human orthologs to reconstruct ancestral sequences (see above) and compared present-day sequences to their most recent ancestral node. The mouse and rat lineage-specific substitution patterns were combined.

The measurement of substitution patterns proceeded as follows. Substitutions between two sequences were designated as either fully synonymous (syn) or fully nonsynonymous (nonsyn) or mixed (part syn and part nonsyn), following the method of LI et al. 1985 Down. All substitutions within 100 bp of every other substitution were investigated, and the totals of all substitution pairs a certain distance apart were noted (if one or both of the substitutions was mixed the necessary weightings were applied, and indels were ignored). Three classes of substitution pairs were investigated: syn-syn, syn-nonsyn, and nonsyn-nonsyn.

Simulated substitution sequences were generated under the assumption of independent substitutions. Simulated sequences were the same length as the real sequences and were generated according to the codon position-specific synonymous and nonsynonymous substitution rates of the real sequences so that the substitution rates of the simulations were the same as those of the real sequences. The same substitution pattern analysis was performed on the simulated sequences as on the real sequences. For each sequence considered the substitution patterns of the real sequence were compared against those of 500 simulated sequences.

Statistics describing the difference between the real and simulated substitution patterns were calculated for all three substitution pair classes. The greater the difference between the real and simulated substitution patterns the greater the nonindependence between real substitutions, and thus we term our statistic substitutional nonindependence (SNI). The numbers of real cases (r) were summed for all N sequences, and for each simulation run the numbers of simulated occurrences (s) were summed for all sequences. SNI is given by the number of simulation runs for which the real total was greater than the simulated total, so for 500 simulations per sequence we have the formula

Under the null assumption of no difference between real and simulated substitution patterns, the expected value of SNI is 250. Using the normal distribution as an approximation to the binomial, we find the one-tailed 95% upper confidence limit to be 268. If we apply the Bonferroni correction for considering 100 different substitution pair distances (as described on page 240 in SOKAL and ROHLF 1995 Down), the upper confidence limit is 286.

As an aid to visualization of substitutional nonindependence, we have also provided plots of substitution pair class separation against the statistic real over simulated (ROS), which increases from unity upward as substitutional nonindependence increases, and which is defined as


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

The KA-KS correlation is consistent with neutral theory:
Using ML rate estimation and a large set of orthologous mouse-rat genes (see MATERIALS AND METHODS and Figure 1), we estimated the KA-KS correlation coefficient by rank correlation followed by the z-transformation (SOKAL and ROHLF 1995 Down). Contrary to the suggestion that a significant KA-KS correlation results from the inclusion of paralogs (HUGHES and YEAGER 1997 Down), and in agreement with previous findings (MOUCHIROUD et al. 1995 Down; MAKALOWSKI and BOGUSKI 1998B Down), we find a highly significant positive correlation between KA and KS (P < 0.0001) using both algorithmic and ML rate estimation methods.



View larger version (18K):
In this window
In a new window
Download PPT slide
 
Figure 1. KA plotted against KS for 432 mouse-rat genes. Substitution rates were estimated using maximum-likelihood methods. The linear regression line is shown.

In addition we calculated the KA-KS correlation coefficients predicted by the neutral theory explanation as given by OHTA and INA 1995 Down. In agreement with their results, we find that the neutral theory is unable to explain the strength of the observed KA-KS correlation if algorithmic rate estimation methods are used, with the observed correlation coefficient R greater than the expected correlation coefficient {rho} (R = 0.411 and {rho} = 0.270). Statistical testing is difficult because the variance of {rho} is not theoretically tractable (OHTA and INA 1995 Down), but simulations have shown that findings of R >> {rho} can be explained by pure chance (INA 1996A Down).

However, in contrast to the results of OHTA and INA 1995 Down, we find that the neutral theory is consistent with the strength of the observed KA-KS correlation if ML rate estimation methods are used, with R less than but similar to {rho} (R = 0.275 and {rho} = 0.343). The evolutionary model specified in PAML is more general than that of the algorithmic method, which might lead one to conclude that the PAML rate estimates are probably more reliable. However, the PAML rate estimates should not be considered perfect: standard errors, required to predict {rho}, are estimated using the normal approximation to the likelihood curve; and the model of evolution makes no allowance for rate variation between sites. There is also the question of whether pairwise sequence comparisons provide enough data for the ML approach to provide unbiased estimates. We conclude that it is unclear which of the algorithmic or ML approaches is more reliable and thus can only note the methodological sensitivity of the strength of the KA-KS correlation relative to the neutral theory prediction.

The importance of tandem substitutions:
The influence of tandem substitutions was investigated using ML rate estimation (similar results were obtained using algorithmic methods). If tandem substitutions were ignored, the expected correlation coefficient considerably exceeded the observed correlation coefficient (R = 0.046 and {rho} = 0.349); thus tandem substitutions appear to make a large contribution to the strength of the KA-KS correlation. Upon removal of tandem substitutions the ratio of the expected correlation coefficient to the observed correlation coefficient changes from 1.25 to 7.59, a sixfold increase.

If only those genes with no tandem substitutions were considered (N = 67), the KA-KS correlation was zero, considerably below the neutral expectation (R = 0 and {rho} = 0.344). This result suggests that the KA-KS correlation is generated almost exclusively by tandem substitutions, although this interpretation should be treated with caution as the genes with no tandem substitutions were atypically short and slowly evolving (data not shown).

Substitutional nonindependence mainly affects adjacent bases:
The KA-KS correlation is strengthened if there is substitutional nonindependence between synonymous and nonsynonymous sites (see Introduction). The effect of tandem substitutions on the KA-KS correlation implies nonindependence between adjacent substitutions; but does substitutional nonindependence occur at other distances? We measured the nonindependence between syn-nonsyn pairs of substitutions at all pair separation distances from 1 to 100 bases (see MATERIALS AND METHODS). If all substitutions are considered, then substitutional nonindependence appears to operate at a variety of distances: 80 of the 100 syn-nonsyn pairs have highly significant SNI values (P < 0.05 with Bonferroni correction). The ROS plot (Figure 2) shows high levels of substitutional nonindependence for the syn-nonsyn pairs, with ROS values tending to decrease as the distance between the two substitutions increases (note that tandem syn-nonsyn substitutions give the highest ROS value).



View larger version (15K):
In this window
In a new window
Download PPT slide
 
Figure 2. The ROS plots of substitutional nonindependence for the rodent lineages. Values are given for all three classes of substitution and for all pair separation distances from 1 to 100 bp (see MATERIALS AND METHODS).

To check whether substitutional nonindependence really exists beyond effects between adjacent bases, we investigated the effect of the removal of tandem substitutions on substitution patterns. The resultant change in patterns of substitutional nonindependence is striking (compare Figure 2 and Figure 3). Not a single syn-nonsyn pair yielded a significantly high SNI value (P > 0.05 without Bonferroni correction). These results imply that whatever process (selection or mutation) is responsible for the nonindependence of syn-nonsyn substitutions, then that process is mainly acting on adjacent bases and causing an excess of tandem substitutions.



View larger version (16K):
In this window
In a new window
Download PPT slide
 
Figure 3. The ROS plots of substitutional nonindependence for the rodent lineages after removal of tandem substitutions. Compare with Figure 2.

The excess of tandem substitutions is not due to doublet mutations:
If mutational processes are sufficient to explain the excess of tandem substitutions without recourse to selection, then synonymous changes are neutral and doublet mutations are responsible for the excess of tandem substitutions. From these assumptions we can predict an excess of neighboring syn-syn pairs. But the SNI value for neighboring syn-syn pairs is 143, which is lower than the null expectation of 250. This means that either doublet mutations do not occur or that synonymous doublet mutations are subject to purifying selection. Either way, we can conclude that mutation alone is unable to explain the excess of tandem substitutions. Hence by elimination we are left with a selective explanation for the excess of tandem substitutions.

Selection on silent sites is demonstrated by patterns of substitutional nonindependence:
We have shown that synonymous-nonsynonymous substitutional nonindependence does not appear to exist beyond the interactions of adjacent bases. Given that we have also provided evidence against doublet mutations, we have no reason to believe in any form of mutational nonindependence. If we make the assumption that mutation does not differentiate between synonymous and nonsynonymous sites, then we can conclude that any differences in substitutional nonindependence between the three classes of substitution pairs (syn-syn, syn-nonsyn, and nonsyn-nonsyn) must be due to selection.

The different types of substitution pairs do indeed show significantly different levels of substitutional nonindependence. For each class of pairs, 100 different measures of SNI were obtained, corresponding to all the pair separation distances from 1 to 100 bp. Out of a possible maximum of 100, 96 of the nonsyn-nonsyn pair classes, 80 of the syn-nonsyn pair classes, and 69 of the syn-syn pair classes have highly significant SNI values (P < 0.05 with Bonferroni correction). The ROS plots (Figure 2) show the same pattern of substitutional nonindependence decreasing in the order of nonsyn-nonsyn, syn-nonsyn, and syn-syn. Both the nonsyn-nonsyn (Mann-Whitney U-test, P < 0.0001) and syn-nonsyn (Mann-Whitney U-test, P = 0.04) pair classes show significantly greater substitutional nonindependence than the syn-syn pair class.

These results do not appear to be the result of unreliable ancestral sequence reconstruction, because qualitatively identical results are obtained from the mouse-rat interspecies comparison as from the lineage-specific comparisons (data not shown). Therefore selection appears to be operating on silent sites, though we accept that our conclusion is based on an assumption concerning the nature of the mutational process. We now attempt to discern the precise nature of the selection on silent sites.

Selection for major codon usage:
If selection acts to favor major codon usage (AKASHI and EYRE-WALKER 1998 Down), then substitutional nonindependence should be greater for pairs of substitutions within codons than pairs of substitutions between codons.

The syn-nonsyn substitution pairs at distances of 1 and 2 bp were both divided into three classes according to the codon positions of the substitutions. The pairs 1 bp apart were classified as 1-2, 2-3, and 3-1. Both 1-2 and 2-3 represent a pair of substitutions within a codon, while 3-1 invokes substitutions in adjacent codons. Similarly, the pairs 2 bp apart were classified as 1-3, 2-1, and 3-2. In this case only 1-3 comprises substitutions within a codon, while both 2-1 and 3-2 involve substitutions in adjacent codons.

All six substitution pair classes show highly significant SNI values (P < 0.05 with Bonferroni correction), and thus the SNI data are equivocal on the issue of selection for codon usage. The ROS data are contrary to predictions based on selection for codon usage: ROS is greater in the 3-1 class than in the 1-2 and 2-3 classes, and ROS in the 1-3 class is intermediate between that in the 2-1 and 3-2 classes (see Figure 4). Our finding of no evidence in favor of selection for major codon usage in mammals supports previous studies (EYRE-WALKER 1991 Down; SMITH and HURST 1999 Down).



View larger version (13K):
In this window
In a new window
Download PPT slide
 
Figure 4. The ROS measures of substitutional nonindependence for six classes of syn-nonsyn pairs as defined by the codon positions of the substitutions. If selection acts on codon usage, then those pairs contained within a single codon (1-2, 2-3, and 1-3) should have higher ROS values than the other pairs (3-1, 2-1, and 3-2).

Selection for base composition:
The relationships between synonymous substitution rates and a number of compositional characters were examined to test predictions of specific selective pressures. Significant correlations would be consistent with selection acting directly on base composition or a link between selection and other characters that correlate with composition (such as recombination; EYRE-WALKER 1993 Down). However, this test is not capable of providing strong evidence in favor of selection, because KS-composition correlations could be the result of mutation rather than selection.

As with the KA-KS correlation, the alternative methods of rate estimation yield different results. With the algorithmic method KS does not correlate strongly with either GC4 (G plus C content at fourfold degenerate sites; R = 0.008), A4 (R = -0.03), C4 (R = -0.025), G4 (R = 0.071), or T4 (R = -0.007). Using the more reliable algorithmic measure of K4 we also find no correlation between synonymous divergence and base composition (GC4 and K4; R = 0.002; see Figure 5). However, with PAML we find significant correlations (P < 0.0001) for all compositional parameters: GC4 (R = 0.258; see Figure 5), A4 (R = -0.264), C4 (R = 0.187), G4 (R = 0.247), and T4 (R = -0.206).



View larger version (28K):
In this window
In a new window
Download PPT slide
 
Figure 5. Base composition at fourfold degenerate sites (GC4) plotted against synonymous divergence for 432 mouse-rat genes. Two measures of synonymous divergence are shown: PAML KS is a maximum-likelihood estimate while K4 is an algorithmic estimate (see MATERIALS AND METHODS). The linear regression lines show a significant relationship between PAML KS and GC4 but not between K4 and GC4.

These differences between the methods are all the more surprising when one considers that, as one would expect, the alternative measures of synonymous divergence are highly significantly correlated (R ~ 0.9). Given that we are unable to choose between algorithmic and ML methods (see above), these data are equivocal on the issue of selection on silent sites (for evidence of selection on the base composition of mammalian silent sites, see EYRE-WALKER 1999 Down). However, our results are pertinent to the debate as to whether there is a relationship between KS and base composition. The existence of a significant correlation was originally suggested by WOLFE et al. 1989 Down on the basis of a fairly small sample. BERNARDI et al. 1997 Down subsequently showed that the inverted V distribution obtained by WOLFE et al. 1989 Down was at least partially due to rate estimate biases (see PESOLE et al. 1995 Down). However, our ML results suggest a linear relationship between GC4 and KS (see Figure 5), which cannot be so easily explained by methodological biases.

Selection for RNA structure:
Selection on RNA structure has been proposed as an explanation for the reduced KS at the start of protein-coding enterobacterial genes, with an open structure thought to favor ribosome binding (EYRE-WALKER and BULMER 1993 Down). We have found a similar pattern in our set of mammalian genes (see Figure 6). For all 354 genes with mouse-rat alignments longer than 600 bp, KS was estimated using algorithmic methods for five regions of the gene: the whole gene and the first four nonoverlapping sections of 50 codons. The first 50 codons at the start of the gene have a significantly low KS in comparison to both the whole gene (Mann-Whitney U-test, P < 0.0001) and three subsequent 50-codon blocks (Mann-Whitney U-tests, P = 0.0019, P = 0.0081, P = 0.0047). These findings provide us with suggestive, although by no means conclusive, evidence that silent sites in mammals are affected by selection.



View larger version (16K):
In this window
In a new window
Download PPT slide
 
Figure 6. The first 50 codons of a gene (region 1) have a low KS relative to the whole gene and the three subsequent 50-codon nonoverlapping sections (regions 2 to 4). The error bars indicate the SEs of the means. Rates were estimated using algorithmic methods.

It is thought that longer mRNAs have a lower density of longer stem loops, and so selection on RNA structure is predicted to decrease with increasing gene length (COMERON and AGUADE 1996 Down). We find no correlation between gene length and KS for either rate estimation method, though we note that this appears to be a weak test of selection.


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

With a ML approach to rate estimation, the rodent KA-KS correlation coefficient is consistent with the neutral theory, but using an algorithmic approach the correlation is stronger than expected. Despite such methodological uncertainty we have found strong evidence to suggest that the excess of tandem substitutions generated by substitutional nonindependence contributes to the strength of the rodent KA-KS correlation coefficient. The removal of tandem substitutions reduces the KA-KS correlation coefficient by a factor of six, and there exists no KA-KS correlation for those genes that do not contain tandem substitutions. Substitutional nonindependence between adjacent bases, the process that generates the excess of tandem substitutions, appears to be the dominant form of substitutional nonindependence.

What causes the excess of tandem substitutions that contribute to the KA-KS correlation? Is it selection or mutation? We demonstrate that the mutational explanation fails due to a lack of evidence for doublet mutations, which means that selection must be responsible for the excess of synonymous-nonsynonymous tandem substitutions. Our analysis of the substitution patterns of the different pair classes also supports the notion of silent site selection, and encourages us to investigate the form of selection acting on silent sites. It might be argued that our finding of substitutional nonindependence caused by selection is inconsistent with our finding using ML methods that the KA-KS correlation is consistent with neutrality, but the neutral prediction should remain reasonably accurate as long as the proportion of silent sites affected by selection is low. Although tandem substitutions are contributing greatly to the KA-KS correlation, selection may generate a relatively small excess of tandems above those predicted on the basis of neutrality.

By examining substitution patterns we have provided evidence against selection acting on codon usage. We have found that the existence of correlations between KS and base composition depends on rate estimation methodology and offers no clue as to whether selection via base composition acts on silent sites. There is no correlation between KS and gene length, but selection on RNA structure is consistent with our finding that KS is reduced at the start of mammalian genes. Although further work is clearly required to examine this supposition, we suggest that selection on RNA structure is a possible explanation for the strong syn-syn substitutional nonindependence at distances of 71 and 91 bp (see Figure 2).

What are the implications of our results with respect to mammalian molecular evolution? We have found three reasons to believe that silent sites in mammals are subject to selection: (i) mutation cannot explain the excess of syn-nonsyn tandem substitutions, therefore selection is responsible by elimination; (ii) a comparison of the levels of substitutional nonindependence of the syn-syn, syn-nonsyn, and nonsyn-nonsyn classes of substitution pairs appears to indicate the effects of selection; and (iii) low KS at the start of genes is consistent with selection on RNA structure. Although arguments (ii) and (iii) are by no means certain, we consider reason (i) to provide strong evidence for silent site selection.

Selection on silent sites can explain the overdispersion of silent sites in mammals (as in OHTA 1995 Down). But does silent site selection necessarily invalidate those studies of the evolution of the mutation rate in mammals, which assume that silent sites are neutral and hence that KS can be used as an unbiased estimator of the mutation rate (as in MCVEAN and HURST 1997A Down)? Although we have found evidence of selection on silent sites we still believe that KS provides the best available estimate of the mutation rate. First, KS values before and after the removal of tandem substitutions are highly significantly correlated (using PAML, R = 0.927 and P < 0.00001). Second, tests of adaptive mutation rates hold both before and after the removal of tandem substitutions (SMITH and HURST 1999 Down). Third, there is a practical argument in favor of using KS, which is that the alternative way to estimate mutation rates is to use noncoding DNA sequence data, the alignment of which is problematic (SMITH and HURST 1998 Down).


*  ACKNOWLEDGMENTS

The authors thank Ziheng Yang, Yasuo Ina, Adam Eyre-Walker, Paul Higgs, and Jonathan Slack. L.D.H. is funded by the Royal Society.

Manuscript received March 31, 1999; Accepted for publication August 2, 1999.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

AKASHI, H. and A. EYRE-WALKER, 1998  Translational selection and molecular evolution. Curr. Opin. Genet. Dev. 8:688-693[Medline].

ALVAREZ-VALIN, F., K. JABBARI, and G. BERNARDI, 1998  Synonymous and nonsynonymous substitutions in mammalian genes: intragenic correlations. J. Mol. Evol. 46:37-44[Medline].

BERNARDI, G., D. MOUCHIROUD and C. GAUTIER, 1997 Isochores and synonymous substitutions in mammalian genes, pp. 137–168 in DNA and Protein Sequence Analysis, edited by M. J. BISHOP and C. J. RAWLINGS. Oxford University Press, Oxford.

COMERON, J. M. and M. AGUADE, 1996  Synonymous substitutions in the Xdh gene of Drosophila—heterogeneous distribution along the coding region. Genetics 144:1053-1062[Abstract].

DURET, L., D. MOUCHIROUD, and M. GOUY, 1994  Hovergen—a database of homologous vertebrate genes. Nucleic Acids Res. 22:2360-2365[Abstract/Free Full Text].

EYRE-WALKER, A., 1991  An analysis of codon usage in mammals: selection or mutation bias? J. Mol. Evol. 33:442-449[Medline].

EYRE-WALKER, A., 1993  Recombination and mammalian genome evolution. Proc. R. Soc. Lond. Ser. B 252:237-243[Medline].

EYRE-WALKER, A., 1998  Problems with parsimony in sequences of biased base composition. J. Mol. Evol. 47:686-690[Medline].

EYRE-WALKER, A., 1999  Evidence of selection on silent site base composition in mammals: potential implications for the evolution of isochores and junk DNA. Genetics 152:675-683[Abstract/Free Full Text].

EYRE-WALKER, A. and M. BULMER, 1993  Reduced synonymous substitution rate at the start of enterobacterial genes. Nucleic Acids Res. 21:4599-4603[Abstract/Free Full Text].

GCG, 1994 Program Manual for the Wisconsin Package, Version 8. Genetics Computer Group, Madison, WI.

GOLDMAN, N. and Z. YANG, 1994  A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11:725-736[Abstract].

HUGHES, A. L. and M. YEAGER, 1997  Comparative evolutionary rates of introns and exons in murine rodents. J. Mol. Evol. 45:125-130[Medline].

INA, Y., 1996a Correlation between synonymous and nonsynonymous substitutions and variation in synonymous substitution numbers, pp. 105–113 in Current Topics on Molecular Evolution, edited by M. NEI and N. TAKAHATA. Institute of Molecular Evolutionary Genetics, Penn State University, University Park, PA and The Graduate University for Advanced Studies, Hayama, Japan.

INA, Y., 1996b  Pattern of synonymous and nonsynonymous substitutions: an indicator of mechanisms of molecular evolution. J. Genet. 75:91-115.

KONDRASHOV, A. S., 1995  Modifiers of mutation-selection balance: general-approach and the evolution of mutation-rates. Genet. Res. 66:53-69.

LI, W. H., 1993  Unbiased estimation of the rates of synonymous and nonsynonymous substitution. J. Mol. Evol. 36:96-99[Medline].

LI, W. H., 1997 Molecular Evolution. Sinauer Associates, Sunderland, MA.

LI, W. H., C. I. WU, and C. C. LUO, 1985  A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Mol. Biol. Evol. 2:150-174[Abstract].

LIPMAN, D. J. and W. J. WILBUR, 1985  Interaction of silent and replacement changes in eukaryotic coding sequences. J. Mol. Evol. 21:161-167.

MAKALOWSKI, W. and M. S. BOGUSKI, 1998a  Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous rodent and human sequences. Proc. Natl. Acad. Sci. USA 95:9407-9412[Abstract/Free Full Text].

MAKALOWSKI, W. and M. S. BOGUSKI, 1998b  Synonymous and nonsynonymous substitution distances are correlated in mouse and rat genes. J. Mol. Evol. 47:119-121[Medline].

MCVEAN, G. T. and L. D. HURST, 1997a  Evidence for a selectively favourable reduction in the mutation rate of the X chromosome. Nature 386:388-392[Medline].

MCVEAN, G. T. and L. D. HURST, 1997b  Molecular evolution of imprinted genes: no evidence for antagonistic coevolution. Proc. R. Soc. Lond. Ser. B 264:739-746[Medline].

MORIYAMA, E. N. and J. R. POWELL, 1997  Synonymous substitution rates in Drosophila: mitochondrial versus nuclear genes. J. Mol. Evol. 45:378-391[Medline].

MOUCHIROUD, D., C. GAUTIER, and G. BERNARDI, 1995  Frequencies of synonymous substitutions in mammals are gene-specific and correlated with frequencies of nonsynonymous substitutions. J. Mol. Evol. 40:107-113[Medline].

OHTA, T., 1995  Synonymous and nonsynonymous substitutions in mammalian genes and the nearly neutral theory. J. Mol. Evol. 40:56-63[Medline].

OHTA, T. and Y. INA, 1995  Variation in synonymous substitution rates among mammalian genes and the correlation between synonymous and nonsynonymous divergences. J. Mol. Evol. 41:717-720[Medline].

PESOLE, G., G. DELLISANTI, G. PREPARATA, and C. SACCONE, 1995  The importance of base composition in the correct assessment of genetic distance. J. Mol. Evol. 41:1124-1127.

RICE, P., 1997 Program Manual for the EGCG Package. The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, England.

SMITH, N. G. C. and L. D. HURST, 1998  Sensitivity of patterns of molecular evolution to alterations in methodology: a critique of Hughes and Yeager. J. Mol. Evol. 47:493-500[Medline].

SMITH, N. G. C. and L. D. HURST, 1999  The causes of synonymous rate variation in the rodent genome: can substitution rates be used to estimate the sex bias in mutation rate? Genetics 152:661-673[Abstract/Free Full Text].

SOKAL, R. R., and F. J. ROHLF, 1995 Biometry. W. H. Freeman and Company, New York.

TAMURA, K., 1992  Estimation of the number of nucleotide substitutions when there are strong transition-transversion and G+C content biases. Mol. Biol. Evol. 10:512-526[Abstract].

TAMURA, K. and M. NEI, 1993  Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 10:512-526.

THOMPSON, J. D., D. G. HIGGINS, and T. J. GIBSON, 1994  ClustalW—improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680[Abstract/Free Full Text].

WOLFE, K. H. and P. M. SHARP, 1993  Mammalian gene evolution—nucleotide sequence divergence between mouse and rat. J. Mol. Evol. 37:441-456[Medline].

WOLFE, K. H., P. M. SHARP, and W. H. LI, 1989  Mutation rates differ among regions of the mammalian genome. Nature 337:283-285[Medline].

YANG, Z., 1997  PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13:555-556[Free Full Text].




This article has been cited by other articles:


Home page
Mol Biol EvolHome page
A. M. Andres, C. de Hemptinne, and J. Bertranpetit
Heterogeneous Rate of Protein Evolution in Serotonin Genes
Mol. Biol. Evol., December 1, 2007; 24(12): 2707 - 2715.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
A. M. Resch, L. Carmel, L. Marino-Ramirez, A. Y. Ogurtsov, S. A. Shabalina, I. B. Rogozin, and E. V. Koonin
Widespread Positive Selection in Synonymous Sites of Mammalian Genes
Mol. Biol. Evol., August 1, 2007; 24(8): 1821 - 1831.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. T. Webster, N. G.C. Smith, M. J. Lercher, and H. Ellegren
Gene Expression, Synteny, and Local Similarity in Human Noncoding Mutation Rates
Mol. Biol. Evol., October 1, 2004; 21(10): 1820 - 1830.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
M. J. Lercher, J.-V. Chamary, and L. D. Hurst
Genomic Regionality in Rates of Evolution Is Not Explained by Clustering of Genes of Comparable Expression Profile
Genome Res., June 1, 2004; 14(6): 1002 - 1013.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
J.-V. Chamary and L. D. Hurst
Similar Rates but Different Modes of Sequence Evolution in Introns and at Exonic Silent Sites in Rodents: Evidence for Selectively Driven Codon Usage
Mol. Biol. Evol., June 1, 2004; 21(6): 1014 - 1023.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
E. P. C. Rocha and A. Danchin
An Analysis of Determinants of Amino Acids Substitution Rates in Bacterial Proteins
Mol. Biol. Evol., January 1, 2004; 21(1): 108 - 116.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
N. Bierne and A. Eyre-Walker
The Problem of Counting Sites in the Estimation of the Synonymous and Nonsynonymous Substitution Rates: Implications for the Correlation Between the Synonymous Substitution Rate and Codon Usage Bias
Genetics, November 1, 2003; 165(3): 1587 - 1597.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
A. O. Urrutia and L. D. Hurst
The Signature of Selection Mediated by Expression on Human Genes
Genome Res., October 1, 2003; 13(10): 2260 - 2264.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
J. Duan, M. S. Wainwright, J. M. Comeron, N. Saitou, A. R. Sanders, J. Gelernter, and P. V. Gejman
Synonymous mutations in the human dopamine receptor D2 (DRD2) affect mRNA stability and synthesis of the receptor
Hum. Mol. Genet., February 1, 2003; 12(3): 205 - 216.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
A. P. Dias, E. L. Braun, M. D. McMullen, and E. Grotewold
Recently Duplicated Maize R2R3 Myb Genes Provide Evidence for Distinct Mechanisms of Evolutionary Divergence after Duplication
Plant Physiology, February 1, 2003; 131(2): 610 - 620.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
N. G. C. Smith, M. T. Webster, and H. Ellegren
A Low Rate of Simultaneous Double-Nucleotide Mutations in Primates
Mol. Biol. Evol., January 1, 2003; 20(1): 47 - 53.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Castresana
Genes on human chromosome 19 show extreme divergence from the mouse orthologs and a high GC content
Nucleic Acids Res., April 15, 2002; 30(8): 1751 - 1756.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. J. Lercher, E. J. B. Williams, and L. D. Hurst
Local Similarity in Evolutionary Rates Extends over Whole Chromosomes in Human-Rodent and Mouse-Rat Comparisons: Implications for Understanding the Mechanistic Basis of the Male Mutation Bias
Mol. Biol. Evol., November 1, 2001; 18(11): 2032 - 2039.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. P. Bielawski, K. A. Dunn, and Z. Yang
Rates of Nucleotide Substitution and Mammalian Nuclear Gene Evolution: Approximate and Maximum-Likelihood Methods Lead to Different Conclusions
Genetics, November 1, 2000; 156(3): 1299 - 1308.
[Abstract] [Full Text]


Home page
ScienceHome page
M. Averof, A. Rokas, K. H. Wolfe, and P. M. Sharp
Evidence for a High Frequency of Simultaneous Double-Nucleotide Substitutions
Science, February 18, 2000; 287(5456): 1283 - 1286.
[Abstract] [Full Text]