As wild organisms adapt to the laboratory environment, they become less relevant as biological models. It has been suggested that a commonly used S. cerevisiae strain has rapidly accumulated mutations in the lab. We report a low-to-intermediate rate of protein evolution in this strain relative to wild isolates.
WHEN introduced into the lab, wild organisms often undergo selection for easier growth. This adaptation, and loss of selective pressures normally present in the wild, may have wide-ranging effects, such that the biology of a lab organism may no longer reflect that of wild populations. This concern has arisen in the recent literature for Saccharomyces cerevisiae (Liu et al. 1996; Bonhivers et al. 1998; Yvert et al. 2003; Deutschbauer and Davis 2005; Dunn et al. 2005; Gu et al. 2005; Qin and Lu 2006). The S288C yeast strain was bred in the 1950s from wild and commercial strains (Mortimer and Johnston 1986) and passed into use as a common lab strain. In a recent comparison of S288C with the clinical strain YJM789 (Gu et al. 2005), isolated in 1989 (Tawfik et al. 1989), the phylogenetic lineage to the lab strain exhibited faster protein evolution. One interpretation of this result is that S288C accumulated more mutations during its longer tenure in the lab. Here we revisit this hypothesis in the context of a third strain, the vineyard isolate RM11-1a, which was introduced into the lab in 1996 (Torok et al. 1996; Brem et al. 2002).
To parallel previous calculations (Gu et al. 2005), we first analyzed protein evolution in yeast strain pairs. We obtained ORF alignments of S288C, RM11-1a (hereafter RM), YJM789 (hereafter YJM), and S. paradoxus orthologs (Ronald et al. 2005) and eliminated frameshifts, for a total of 4162 genes. We reasoned that most assumptions of molecular evolution methods would be as valid here as when species are compared, and the approximation of constant generation time may be better in this case. As such, for each pair of S. cerevisiae strains plus S. paradoxus, we used PAML (Yang 1997) to infer maximum-likelihood branch lengths for the star tree describing each gene, assuming an independent evolutionary rate for each lineage. We then used nonsynonymous and synonymous changes in inferred trees to estimate genomewide evolutionary rates for each lineage as described (Chimpanzee Sequencing and Analysis Consortium 2005; Gu et al. 2005). The results are shown in Table 1. As expected, in a comparison of S288C and YJM, the lineage to the lab strain had a faster evolutionary rate. However, in other comparisons, the rate for the RM lineage was faster still (Table 1). Thus, the vineyard strain bears the strongest signature of rapid protein evolution.
To confirm this, we sought to analyze the three S. cerevisiae strains and S. paradoxus simultaneously. As the genealogy of these genomes varies between loci (Ruderfer et al. 2006), we modeled each gene separately. For each gene, we inferred branch lengths, assuming independent evolutionary rates on all branches, fixing in turn each of the first three topologies in Figure 1. We then identified the maximum-likelihood tree from among the three inferred trees. If this best tree had an internal branch of length zero, we considered the gene to follow the star topology. When multiple runs of phylogenetic inference did not converge to the same topology, we discarded the gene. The remaining data set comprised 3682 genes. Their maximum-likelihood topologies, shown in Figure 1, indicate that for the majority of genes, RM diverged after the divergence of YJM and S288C. Consistent with this, the estimated time since the internal branch is longer at loci where YJM and S288C are the most closely related strain pair, relative to the rest of the genes (data not shown). We conclude that longer waiting times were required for migration and coalescent events between the lab and clinical strain lineages.
We next grouped all genes with the same topology in Figure 1 and, for each group, estimated evolutionary rates as above. The results are given in Table 2. In each case the lineage to RM, rather than to the lab strain, has the fastest evolutionary rate of all branches. Interestingly, accelerated protein evolution does not seem unique to RM. At loci where the lab or clinical strain shares recent ancestry with RM, both lineages below the internal branch are enriched for nonsynonymous changes, in contrast to loci where S288C and YJM are most closely related to each other (Table 2A). Certain complex demographic scenarios are consistent with this pattern. However, we favor the hypothesis that variation in the ages of branches drives apparent differences in evolutionary rates. Assuming that some existing nonsynonymous changes are mildly deleterious, these alleles are expected to be in excess on recent branches relative to ancient ones, because purifying selection has had a longer time to act on the latter (Williamson and Orive 2002). As the evolutionary time since the internal branch is longest for loci where S288C and YJM are most closely related (see above), branches below the internal node represent a longer period of purifying selection and bear fewer coding changes; at loci where such branches involve the more recently diverged RM, they bear more coding changes. Recent changes in effective population size and adaptation to viticulture niches (Mortimer 2000; Townsend et al. 2003; Aa et al. 2006), and to laboratory and pathogenic niches, likely account for additional variation in protein evolutionary rates.
We have shown that the lab strain S288C evolved at a slow-to-intermediate rate relative to two natural isolates, suggesting that growth in the lab has not engendered deleterious mutation on a wide scale. Other reports indicate that lab strains are not exceptionally diverged from other subpopulations (de Barros Lopes et al. 1999; Winzeler et al. 2003; Ben-Ari et al. 2005; Fay and Benavides 2005; Aa et al. 2006), although certain S288C alleles were almost certainly selected in the lab (Liu et al. 1996; Bonhivers et al. 1998). By contrast, the vineyard isolate RM has the fastest rate of protein evolution genomewide, and nonsynonymous changes are enriched in other strains where these strains share recent ancestry with RM. Future work will be needed to test our hypothesis that apparently accelerated protein evolution in recently diverged individuals reflects slow selection against mildly deleterious alleles destined to be lost.
The authors thank A. P. J. de Koning for discussions; Z. Gu and L. Steinmetz for careful review of the data; S. Sawyer for comments on the manuscript; and R. Gentleman for his generosity with space and resources. This work was supported by a Burroughs-Wellcome Career Award at the Scientific Interface and was performed in part while R.B. was in the Program for Computational Biology at the Fred Hutchinson Cancer Research Center.
Communicating editor: G. Gibson
- Received May 16, 2006.
- Accepted June 29, 2006.
- Copyright © 2006 by the Genetics Society of America