Genomic Background Predicts the Fate of Duplicated Genes: Evidence From the Yeast Genome
Ze Zhang, Hirohisa Kishino


Gene duplication with subsequent divergence plays a central role in the acquisition of genes with novel function and complexity during the course of evolution. With reduced functional constraints or through positive selection, these duplicated genes may experience accelerated evolution. Under the model of subfunctionalization, loss of subfunctions leads to complementary acceleration at sites with two copies, and the difference in average rate between the sequences may not be obvious. On the other hand, the classical model of neofunctionalization predicts that the evolutionary rate in one of the two duplicates is accelerated. However, the classical model does not tell which of the duplicates experiences the acceleration in evolutionary rate. Here, we present evidence from the Saccharomyces cerevisiae genome that a duplicate located in a genomic region with a low-recombination rate is likely to evolve faster than a duplicate in an area of high recombination. This observation is consistent with population genetics theory that predicts that purifying selection is less effective in genomic regions of low recombination (Hill-Robertson effect). Together with previous studies, our results suggest the genomic background (e.g., local recombination rate) as a potential force to drive the divergence between nontandemly duplicated genes. This implies the importance of structure and complexity of genomes in the diversification of organisms via gene duplications.

GENE duplication has long been thought to be one of the principal engines powering the evolution of protein function, allowing for increases in genomic complexity (Ohno 1970). Because a duplication event creates two fully functional overlapping copies, one of the two paralogs may evolve in a neutral manner right after duplication due to functional redundancy. In most cases, this duplicated gene accumulates deleterious loss-of-function mutations and becomes a pseudogene. There is a small chance that directional selection will lead it to acquire some novel function (Ohno 1970). However, this classical model predicts only an equal chance of evolutionary rate acceleration for each duplicate. Unless there is a selection pressure for overexpression of the genes, the two genes with identical functions cannot be stably retained in the genome (Nowaket al. 1997). Force et al. (1999) explained the selective maintenance of both paralogs by subfunctionalization. The model suggests that each daughter gene adopts part of the function of the parental gene. That is, each developing member of the gene family acquires specificity in gene expression after duplication by the complementary loss of their cis-regulatory elements in each of the copies (Forceet al. 1999). Subfunctionalization can also occur at the protein function level and can lead to functional specialization when one of the duplicate genes becomes better at performing one of the original functions of the parental gene (Hughes 1999; Zhang 2003).

Increasing evidence from genomic data indicates that asymmetric sequence divergence of duplicate genes is quite common. Kondrashov et al. (2002) analyzed 39 genomes from eubacteria, archaea, and eukaryotes and found asymmetric divergence in ∼5% of the 101 duplicated gene pairs analyzed, whereas Van de Peer et al. (2001) found that half of the 26 duplicated gene pairs in zebrafish showed evidence of asymmetric divergence. More recently, a study by Zhang et al. (2003) examined the evolutionary patterns for 250 duplicated gene pairs in the human genome by using the corresponding orthologs from the mouse genome as the outgroups and found that nearly 60% of duplicated pairs have evolved in an asymmetric divergent manner at the amino acid level. Moreover, Conant and Wagner (2003) found that 20–30% of duplicated gene pairs from four completely sequenced eukaryotic genomes showed asymmetric evolution at the amino acid level. Although the classical model of neofunctionalization predicts acceleration in the evolutionary rate in one of the two duplicates, the model does not indicate which of the duplicates experiences the acceleration. In fact, it assumes implicitly an equal chance of acceleration between the two copies. Mounting evidence indicates that most duplicated genes are not redundant from the start because of selection for increased dosage (Forceet al. 1999; Graur and Li 1999; Kondrashovet al. 2002). If this is a rule rather than an exception, both copies are subject to purifying selection after duplication (Kondrashovet al. 2002). As purifying selection intensity is stronger in a high-recombination environment (Hill and Robertson 1966; Carvalho and Clark 1999; Comeronet al. 1999), we would expect that the rate of protein evolution should be lower in regions of high recombination.

In this article, we compare evolutionary rates for the duplicated genes with different local recombination rates by the analysis of the Saccharomyces cerevisiae genome. It is well known that the arrayed structure of duplicated genes in the genome also affects their evolutionary pattern. Tandemly duplicated genes usually evolve in a concerted manner due mainly to unequal crossover and gene conversion mechanisms (Li 1997). In contrast, nontandemly duplicated genes are less affected by gene conversion (Drouin 2002) and evolve in a divergent manner (Rooneyet al. 2002). Thus, this study into the evolution of nontandemly duplicated genes may provide us with some clues for understanding the evolution of new function through gene duplication and subsequent divergence. Our results examining nontandemly duplicated genes clearly show that the acceleration of evolutionary rate in one copy depends on the local recombination rate. This also implies that genomic background may drive the fate of the nontandemly duplicated genes.

The data set used in this article consists of the S. cerevisiae gene duplication database ( The Smith-Waterman algorithm was used to search for paralogs. As most duplicate genes occurred a long time ago [∼100 million years ago (MYA)], resulting from whole-genome duplication in S. cerevisiae (Wolfe and Shields 1997), it is important to focus on duplicated genes with a high signal-to-noise ratio. To minimize the potential effect of different amino acid compositions, gene lengths, or selective constraints on the relation between recombination and evolutionary rate, a very stringent threshold with an expectation value of zero was chosen (Pearson 1991; J. A. Birdsell, personal communication). Of the detected paralogs, we selected 47 sets of paralogous genes, 1 of which is in a high-recombination locus and the other in a low-recombination locus (Gertonet al. 2000; Birdsell 2002). Most genes had only one paralog; however, a few had multiple paralogs. When there was more than one high- or low-recombination locus within a gene family, one was randomly selected for further analysis.

We then searched for the corresponding orthologous sequence for each of the selected paralogs from the Candida albicans genome data as a reference for the relative rate test between nontandem paralogs after gene duplication (Figure 1; Wu and Li 1985). Because S. cerevisiae and C. albicans diverged ∼140–330 MYA (Seoigheet al. 2000), it should be reasonable to use the ortholog from C. albicans as a reference for the relative rate test. The C. albicans genomic data (contig version 6) were downloaded from We used two paralog sequences of each pair as queries to carry out the FASTA amino acid sequence similarity searches against C. albicans genomic data. The ortholog that the two copies had the same best hits and that showed >40% amino acid identity to the two copies was selected as the outgroup, because outgroup with shorter branch lengths yields more trustworthy divergence estimates (Conant and Wagner 2003). For a few pairs, the two copies had different best hits. Then we looked at the InParanoid database that provides orthologous sequence cluster information between organism pairs of S. cerevisiae and Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, Mus musculus, and Homo sapiens (Remmet al. 2001) to find the appropriate outgroups. Finally, we found 30 duplicate pairs that have proper outgroups.

All the protein alignments were produced using CLUSTAL X (Thompsonet al. 1994). Sequence alignments were edited with BioEdit (Hall 1999) and only unambiguously aligned regions were used for further analysis. Finally, the RRTree was applied to all the edited alignments that contain the appropriate outgroups for the relative rate tests (Robinson-Rechavi and Huchon 2000). All the alignments are available from the authors upon request.

Because most duplicate gene pairs are very old, synonymous substitutions between the two copies are almost saturated. Therefore, we focused on the difference in the evolutionary rate of amino acid sequence between the two copies. Table 1 compares KDH, the branch length of a copy at a high-recombination-rate region, with KDL, that of another copy at a low-recombination-rate region (Figure 1). The average of the difference was -0.129 and the t value was 2.325 (d.f. = 29), suggesting a significantly lower rate of the copies at higher-recombination-rate regions (P = 0.014). We found that 14 of the 30 (46.6%) duplicate gene pairs showed asymmetry in the evolutionary rate of amino acid sequence (Table 1). Among the 14 duplicated gene pairs showing asymmetric evolution, 11 experienced an acceleration in the evolutionary rate of the copy in the region of low recombination (Table 1). This proportion is significantly higher than one-half (χ2 = 4.28, P < 0.05). This evidence that the paralog in a region of low recombination rate is more likely to evolve faster than the paralog in an area of high-local-recombination rate is consistent with population genetics theory, which predicts that purifying selection is less effective in genomic regions of low recombination (Hill and Robertson 1966; Carvalho and Clark 1999; Comeronet al. 1999). As most amino acid changes tend to be deleterious (Li 1997), relaxation of the purifying selection to favor the copy in areas with a low-local-recombination rate allows for their accumulation over those located in a region of high-local-recombination rate.

View this table:

Comparison of the rates of evolution between the copies in high-recombination regions and those in low-recombination regions

We did not detect a significant difference in the evolutionary rate between the two copies for 16 of the 30 (53.4%) duplicate gene pairs. The averages of KDH for the groups for which two copies show evolutionary asymmetry and symmetry after duplication are 0.522 and 0.622, respectively (Figure 1 and Table 1). Although there is no significant difference in average evolutionary rate between the two groups (P = 0.18 for t-test), the group showing evolutionary symmetry has a higher average of KDH than the group with the pattern of evolutionary asymmetry. The GC3 contents of copies with higher recombination rates were examined for the two groups. They had very similar average GC3 contents (0.461 and 0.465, respectively). Therefore, different evolutionary patterns of two groups do not appear to be due to base composition. Probably this is because the duplication events and divergence between S. cerevisiae and C. albicans are too old to guarantee sufficient power of the relative rate test. Alternatively, some of them may have been in the process of subfunctionalization, during which the sites of the two copies experienced complementary acceleration. As a result, the different rates were cancelled out over the sites.

Wolfe and Shields (1997) found ∼55 duplicate blocks in the S. cerevisiae genome. To contrast the above result with a negative control, we randomly selected 40 duplicate gene pairs of which two copies of each pair have similar recombination rates and come from different paralogous blocks in the S. cerevisiae gene duplication database ( Of 40 duplicate pairs, 8 were found to show asymmetric evolution (results not shown). This proportion (20%) is much lower than that (46.6%) of duplicate pairs, which have very different recombination rates. This suggests that recombination rate may be a potential force to drive the divergence of duplicate genes.

Figure 1.

—A tree for the relative rate test to detect the difference in evolutionary rate between the high- and low-recombination duplicates in the S. cerevisiae genome using the ortholog from C. albicans as a reference. The approximate duplication time, 100 MYA, is from Wolfe and Shields (1997), and the splitting time of S. cerevisiae and C. albicans, 140–330 MYA, is from Seoighe et al. (2000).

During the past decade, much attention has been focused on the effect of recombination on evolutionary rates and patterns of genes (Stephan and Langley 1989; Takano-Shimizu 1999, 2001; Munteet al. 2001; Birdsell 2002). However, we are still in a state of relative ignorance about how this affects the divergence between the nontandemly duplicated genes. Recently, Birdsell (2002) found a significant effect of recombination on divergence of GC (GC3) content at third codon positions between the high and low recombinational duplicates in the S. cerevisiae genome. He proposed a “constraint hypothesis” (a modified biased gene conversion hypothesis) that explains the observations. Selection is not directly invoked in this hypothesis.

Recently, Thornton and Long (2002) found that the average ratio of nonsynonymous-to-synonymous substitutions between the duplicated genes on the X chromosome is significantly higher than the genome average in D. melanogaster. On the basis of the survey for new retrogenes and their functionality and evolution, they further found a significant excess of retrogenes from the X chromosome that retropose to autosomes. Moreover, most X-derived autosomal retrogenes have evolved a testicular expression pattern (Betranet al. 2002). These observations may be explained by natural selection favoring those new retrogenes that moved to autosomes and thus avoided X inactivation in spermatocytes and suggest the importance of genome position for the origin of new genes. More recently, Zhang and Kishino (2004) presented an example in which changes in local recombination rate drove the divergence at synonymous sites between the duplicated amylase genes in Drosophila.

In this study, we found a significant effect of recombination on divergence in protein evolution between the two duplicates in different recombination rate regions. Differences in purifying selection intensity triggered by differences in local recombination rate can explain these observations. When genome data from many closely related species become available, it will be interesting to examine the site-specific asymmetry of duplicate genes (Gu 1999; Knudsen and Miyamoto 2001; Wang and Gu 2001; Knudsenet al. 2003). Together with previous studies, the results suggest that genomic background (e.g., local recombination rate) has a potential to drive divergence between nontandemly duplicated genes, although it may act through different mechanisms. This also implies that the genomic background should be taken into account to better understand the evolution of duplicated genes.


We thank J. A. Birdsell for kindly sending us sequence data of S. cerevisiae duplicated genes and for helpful discussions. This study has been supported by the Institute for Bioinformatics Research and Development, Japan Science and Technology Agency.


  • Communicating editor: S. Yokoyama

  • Received September 24, 2003.
  • Accepted December 31, 2003.


View Abstract