- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Data Supplement
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Subramanian, S.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Subramanian, S.
Genetics, Vol. 178, 2429-2432, April 2008, Copyright © 2008
doi:10.1534/genetics.107.086405
Nearly Neutrality and the Evolution of Codon Usage Bias in Eukaryotic Genomes
Sankar Subramanian1
Allan Wilson Centre for Molecular Ecology and Evolution, Institute of Molecular BioSciences, Massey University, Auckland, 0632, New Zealand
1 Address for correspondence: Allan Wilson Centre for Molecular Ecology and Evolution, Institute of Molecular BioSciences, Massey University, Private Bag 102 904, North Shore Mail Centre, Auckland, 0632, New Zealand.
E-mail: s.sankarasubramanian{at}massey.ac.nz
Here I show that the mean codon usage bias of a genome, and of the lowly expressed genes in a genome, is largely similar across eukaryotes ranging from unicellular protists to vertebrates. Conversely, this bias in housekeeping genes and in highly expressed genes has a remarkable inverse relationship with species generation time that varies by more than four orders of magnitude. The relevance of these results to the nearly neutral theory of molecular evolution is discussed.
THE nearly neutral theory of evolution proposed by OHTA (1992) predicts that the fate of a substantial fraction of mutations in a population is determined by both natural selection and random genetic drift. According to this theory, the eventual fixation of these mutations is determined by the product of effective population size (Ne) and selection coefficient (s). Therefore, the future of mutations with marginal fitness effects is largely governed by population size. A number of investigations including important studies such as the influence of population size on deleterious mutational load and on the evolution of genome complexity have confirmed this prediction (KEIGHTLEY and EYRE-WALKER 2000; LYNCH and CONERY 2003).
Similarly, selection on synonymous positions leading to bias in codon usage is also known to be weak (AKASHI 1995, 1997; LLOPART and AGUADE 2000). Therefore codon usage bias is an ideal candidate to test the nearly neutral expectation of the population-size effects on weakly selected mutations. However, such studies are scarce, except for a few on a limited data set from closely related Drosophila species (AKASHI 1995; TAMURA et al. 2004). Although a recent study on a large number of prokaryotic genomes has been reported (ROCHA 2004), there has been no previous attempt to examine the magnitude of codon usage bias among various eukaryotes ranging from unicellular protists to vertebrates and its relationship with their population size. The major reason for this limitation is because the magnitude of selection on synonymous codon usage could vary not only among the genomes of eukaryotes, but also among genes of a genome (LI 1997; OHTA 2002).
To examine this, I have assembled a data set of protein coding sequences from 20 eukaryotic species presumably having a wide range of population sizes (Figure 1 legend). First, I examined the variation in the codon usage bias of the whole genome as well as that of the housekeeping genes that are common to all these eukaryotes. In addition, using the gene expression data I investigated these patterns among the genes with very high and very low expression levels. I then examined whether such variations in the codon usage bias could largely be explained by the differences in population size. Since estimates of effective population sizes are hard to obtain, I have used species generation times as a proxy for population size because these two measures are well-known to correlate (CHAO and CARR 1993; OHTA 1993; KEIGHTLEY and EYRE-WALKER 2000). Furthermore, studies in prokaryotes imply that the strength of selection for translational efficiency is itself correlated to generation time (DONG et al. 1996; ROCHA 2004).
|
Codon usage bias was estimated using the modified effective codon number (ENC') method, which accounts for base compositional difference caused by unequal rates of forward and reverse mutations (NOVEMBRE 2002). The genomic mean ENC' was estimated using all the genes of a genome. Also the mean ENC' was computed using the genes involved in translation, which largely consist of ribosomal genes, tRNA synthetases, initiation and elongation factors. These genes have been chosen due to their essential functionality and ubiquitous presence in all eukaryotes as well as their expression in all tissues. Genomic ENC' is mostly similar across eukaryotes and does not show any significant relationship with species generation times (
= –0.15, P = 0.52) (Figure 1A). Conversely, translational genes show an excellent correspondence with the generation time (
= 0.77, P = 0.0008). Since selection on synonymous sites is also known to be modulated by the level of gene expression, the average ENC' was computed for genes with the highest level of expression (the top 1% of genes excluding translational genes) and for genes with the least expression level (1 EST). The relationships shown in Figure 1, A and B are qualitatively similar. The similarity between the mean ENC's of the genomes and the lowly expressed genes suggests that the majority of the genes of the genomes have low codon usage bias. On the other hand the mean ENC' of translational and highly expressed genes suggests that the magnitudes of selection on these two sets of genes are largely the same.
Since the high ENC'values of the genome or the lowly expressed genes suggest a minimal bias in codon usage, this could be used as a baseline to quantify the extent of codon usage bias in translational and highly expressed genes. Therefore the difference in the bias was estimated using the formula
ENC' = (ENC'L – ENC'TH)/ENC'L (ROCHA 2004), where ENC'TH is the average codon usage bias of translational + highly expressed genes and ENC'L is that of low-expressed genes (using the genomic ENC' instead of that of low-expressed genes also produced similar results). Figure 1C shows a highly significant negative relationship between
ENC' and generation time (
= –0.87, P = 0.0002). This also held true when only vertebrates were considered (
= –0.89, P = 0.029). The average
ENC' of vertebrates, (invertebrates + plants) and protists was 0.038, 0.12, and 0.19, respectively, which suggests that the relative estimates of (invertebrates + plants) and of protists are approximately three and five times higher than that of vertebrates.
To examine the phylogenetic nonindependence of this result, independence contrast analysis (FELSENSTEIN 1985) was performed using the CONTRAST package of PHYLIP (FELSENSTEIN 2005). The concatenated alignment of 32 orthologous proteins (obtained through a reciprocal BLAST search) that are common for all the 20 species were used to construct a neighbor-joining tree (see supplemental Figure 1) and the branch lengths were used for the contrast analysis. I have also used a widely accepted eukaryotic tree topology (see supplemental Figure 2) and conducted this analysis using the software CAIC (PURVIS and RAMBAUT 1995). The results from both analyses showed a highly significant relationship (r = –0.72, P < 0.0005) between the standardized contrasts of generation time and
ENC' (see supplemental Table 2). Furthermore, no significant relationship (r = 0.23, P > 0.35) between these contrasts and their variances was observed (see supplemental Table 2) (GARLAND et al. 1992; PURVIS and RAMBAUT 1995). These results suggest that the correlation between generation time and codon usage bias observed in this study is independent and not influenced by the phylogenetic relationship of the species used.
The negative relationship between the
ENC' of translational/highly expressed genes and generation time implies that the selection coefficient (s) is much higher than the fixation probability of a neutral mutation (s > 1/2Ne) but it is small enough to be modulated by population size to fixation (OHTA 1992). This pattern could be explained in two ways on the basis of the variable(s) that correlates with generation time. If population size correlates with generation time, then this relationship could be explained by the differences in population sizes alone by assuming a similarity of s across all species. This seems unlikely because for nearly neutral mutations the absolute value |Nes| has to be between 1 and 2 (OHTA 2002) and thus the intermediate codon usage bias (as opposed to zero or complete codon usage bias) could vary only within this small window. Since the population sizes of the species used in this study vary by several orders of magnitude (
108 unicellular eukaryotes and 104 for human), the observed wide range of
ENC' with respect to the species generation times could not explain the differences in Ne alone by assuming a constant s. An alternate and more likely possibility is that the strength of selection for translational efficiency also correlates with generation time and thus generation time here seems to represent Nes rather than Ne. For example, the effects of mildly deleterious mutations could delay the process of translation and this would affect species with short generation times drastically, as the mutants would be quickly outgrown by the wild types. Similarly, a slightly beneficial mutant would swiftly spread through the population of these species. However, fixation (or elimination) of such mutations is less effective in species with long generation times. Studies on the relationship between growth rates and codon usage bias in prokaryotes support this prediction (DONG et al. 1996; ROCHA 2004).
Recent studies on mammals suggest selection on synonymous sites is caused by factors other than translational selection such as mRNA stability, alternative splicing, and micro RNA binding or the presence of exonic enhancers (PARMLEY and HURST 2007), which might underestimate the absolute values of ENC' for mammals. However, as these factors influence the low-expressed genes as well as translational and highly expressed genes, the relative ratio
ENC' is not affected due to the cancellation of their effects. Furthermore, some of these studies also suggest that only a very small proportion of synonymous positions are affected by these factors as the resultant reduction of divergence in these sites is marginal (1–8%) (HURST 2006; PARMLEY et al. 2006).
The results of this study reveal the relative magnitude of codon usage bias in eukaryotes modulated by their population sizes and also explain the reduction of this bias in species such as vertebrates.
AKASHI, H., 1995 Inferring weak selection from patterns of polymorphism and divergence at "silent" sites in Drosophila DNA. Genetics 139: 1067–1076.[Abstract]
AKASHI, H., 1997 Codon bias evolution in Drosophila. Population genetics of mutation-selection drift. Gene 205: 269–278.[CrossRef][Medline]
CHAO, L., and D. E. CARR, 1993 The molecular clock and the relationship between population-size and generation time. Evolution 47: 688–690.[CrossRef]
DONG, H., L. NILSSON and C. G. KURLAND, 1996 Co-variation of tRNA abundance and codon usage in Escherichia coli at different growth rates. J. Mol. Biol. 260: 649–663.[CrossRef][Medline]
DURET, L., and D. MOUCHIROUD, 2000 Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. Mol. Biol. Evol. 17: 68–74.
FELSENSTEIN, J., 1985 Phylogenies and the comparative method. Am. Nat. 125: 1–15.[CrossRef]
FELSENSTEIN, J., 2005 PHYLIP (Phylogeny Inference Package) Version 3.6. University of Washington, Seattle.
FUGLSANG, A., 2006 Accounting for background nucleotide composition when measuring codon usage bias: brilliant idea, difficult in practice. Mol. Biol. Evol. 23: 1345–1347.
GARLAND, T., P. H. HARVEY and A. R. IVES, 1992 Procedures for the analysis of comparative data using phylogenetically independent contrasts. Syst. Biol. 41: 18–32.[CrossRef]
HURST, L. D., 2006 Preliminary assessment of the impact of microRNA-mediated regulation on coding sequence evolution in mammals. J. Mol. Evol. 63: 174–182.[CrossRef][Medline]
KEIGHTLEY, P. D., and A. EYRE-WALKER, 2000 Deleterious mutations and the evolution of sex. Science 290: 331–333.
LI, W.-H., 1997 Molecular Evolution. Sinauer Associates, Sunderland, MA.
LLOPART, A., and M. AGUADE, 2000 Nucleotide polymorphism at the RpII215 gene in Drosophila subobscura: weak selection on synonymous mutations. Genetics 155: 1245–1252.
LYNCH, M., and J. S. CONERY, 2003 The origins of genome complexity. Science 302: 1401–1404.
NOVEMBRE, J. A., 2002 Accounting for background nucleotide composition when measuring codon usage bias. Mol. Biol. Evol. 19: 1390–1394.
OHTA, T., 1992 The nearly neutral theory of molecular evolution. Annu. Rev. Ecol. Syst. 23: 263–286.[CrossRef]
OHTA, T., 1993 An examination of the generation-time effect on molecular evolution. Proc. Natl. Acad. Sci. USA 90: 10676–10680.
OHTA, T., 2002 Near-neutrality in evolution of genes and gene regulation. Proc. Natl. Acad. Sci. USA 99: 16134–16137.
PARMLEY, J. L., and L. D. HURST, 2007 How do synonymous mutations affect fitness? BioEssays 29: 515–519.[CrossRef][Medline]
PARMLEY, J. L., J. V. CHAMARY and L. D. HURST, 2006 Evidence for purifying selection against synonymous mutations in mammalian exonic splicing enhancers. Mol. Biol. Evol. 23: 301–309.
PURVIS, A., and A. RAMBAUT, 1995 Comparative analysis by independent contrasts (CAIC): an Apple Macintosh application for analysing comparative data. Comput. Appl. Biosci. 11: 247–251.
ROCHA, E. P. C., 2004 Codon usage bias from tRNA's point of view: redundancy, specialization, and efficient decoding for translation optimization. Genome Res. 14: 2279–2286.
TAMURA, K., S. SUBRAMANIAN and S. KUMAR, 2004 Temporal patterns of fruit fly (Drosophila) evolution revealed by mutation clocks. Mol. Biol. Evol. 21: 36–44.
Communicating editor: S. G. HAMISH
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Data Supplement
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Subramanian, S.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Subramanian, S.
