The ancient duplication of the Saccharomyces cerevisiae genome and subsequent massive loss of duplicated genes is apparent when it is compared to the genomes of related species that diverged before the duplication event. To learn more about the evolutionary effects of the duplication event, we compared the S. cerevisiae genome to other Saccharomyces genomes. We demonstrate that the whole genome duplication occurred before S. castellii diverged from S. cerevisiae. In addition to more accurately dating the duplication event, this finding allowed us to study the effects of the duplication on two separate lineages. Analyses of the duplication regions of the genomes indicate that most of the duplicated genes (∼85%) were lost before the speciation. Only a small amount of paralogous gene loss (4–6%) occurred after speciation. On the other hand, S. castellii appears to have lost several hundred genes that were not retained as duplicated paralogs. These losses could be related to genomic rearrangements that reduced the number of chromosomes from 16 to 9. In addition to S. castellii, other Saccharomyces sensu lato species likely diverged from S. cerevisiae after the duplication. A thorough analysis of these species will likely reveal other important outcomes of the whole genome duplication.
GENE redundancy is common. It is produced by duplication of individual genes, by duplication of large chromosomal segments (segmental duplication), by duplication of entire chromosomes (aneuploidy), and by duplication of whole genomes. Gene duplications play a major role in evolution by providing paralogous genes that can acquire specialized functions over time (Ohno 1970). Although rare, whole genome duplications have played a major role in the evolution of species. For instance, whole genome duplications are postulated to have had a major impact on the vertebrate lineage (Ohno 1970, 1998). The whole genome duplication in the Saccharomyces lineage is thought to have shaped the fermentative lifestyle of these yeasts (Wolfe and Shields 1997; Piskur 2001).
Remnants of the whole genome duplication of Saccharomyces cerevisiae are apparent in its genome sequence (Wolfe and Shields 1997). There are 52 “probable” and 32 “possible” blocks of duplicated genes that include ∼500 duplicated gene pairs spanning at least 70% of the genome (Seoighe and Wolfe 1999; Wong et al. 2002). The 16 centromeric regions map into eight duplicated pairs (Wong et al. 2002). Three lines of evidence suggest that these duplicated blocks of genes arose from a whole genome duplication event rather than by successive segmental duplications (Wolfe and Shields 1997). First, most of the duplicated blocks have the same orientation with respect to the telomere, a situation that would not be expected for segmental duplications. Second, if the duplications are due to successive segmental duplications, several triplicated blocks would be expected to have occurred on the basis of Poisson probability, but none are found. Finally, the order of genes in relatives of S. cerevisiae that did not undergo a whole genome duplication, such as Kluyveromyces lactis, K. waltii, and Ashbya gossypii, is what would be expected before a genome duplication event (Keogh et al. 1998; Dietrich et al. 2004; Kellis et al. 2004).
To learn more about the evolutionary consequences of the whole genome duplication, we investigated the fate of the duplicated genes during evolution of Saccharomyces species. We analyzed the genome sequence of species from each of the three major Saccharomyces subgroups (sensu stricto, sensu lato, and petite-negative). The genome of S. bayanus, a member of the sensu stricto group of Saccharomyces species, is highly similar to that of S. cerevisiae and has a high degree of synteny, indicating that it speciated after the genome duplication. S. castellii, a member of the sensu lato group of Saccharomyces species that are more distantly related to S. cerevisiae, also contains a duplicated genome similar to that seen in S. cerevisiae. The fate of many of the duplicated genes is different in these two Saccharomyces species, providing a view of genome evolution after a genome duplication.
MATERIALS AND METHODS
Strains and sequences:
The yeast species whose genome sequences we determined were previously described (Cliften et al. 2003). The genomes analyzed in this report are those of S. bayanus (623-6c), S. castellii (NRRL Y-12630), and S. kluyveri (NRRL Y-12651). The draft sequence assemblies are available at GenBank (project accession nos.: S. bayanus, AACG02000000; S. castellii, AACF00000000; and S. kluyveri, AACE02000000) and from the Saccharomyces Genome Database (http://yeastgenome.org).
The sequencing strategy was described previously (Cliften et al. 2003). Briefly, 3–4× shotgun sequence data were generated for each of the three species. The reads were assembled with Phrap (http://www.phrap.org). Autofinish (Gordon et al. 2001), a semiautomated sequence prefinishing tool, was used to identify clones that spanned sequence gaps between linked contigs and to design primers for filling the gaps.
The S. kluyveri assembly used in this work has undergone two additional rounds of prefinishing since the previously reported assembly. The assembly consists of 79,312 sequencing reads that assembled into 1344 contigs (vs. 2446 in the previous assembly). The total length of the assembly is 11 Mb, with an average contig length of 8.2 kb. The estimated sequence coverage is 3.5× with 95% of the bases having a Phred quality score of ≥40 (estimated error rate of <1/10,000) and 99% of the bases having a Phred score of ≥20 (estimated error rate of <1/100).
The S. bayanus assembly used in this analysis consists of 78,780 sequence reads generated by us and 146,796 reads generated at the Broad Institute (Kellis et al. 2003). The statistics of the Phrap assembly are: 11.7 Mb in total, 8.6-fold sequence coverage with 99.0 and 99.8% of the bases being P40 and P20, respectively, 678 contigs (335 of them >1 kb), and average contig length 17.3 kb.
Identification and analysis of duplicated blocks:
Sequence contigs of the Saccharomyces species' genomes were annotated on the basis of similarity to S. cerevisiae proteins detected by WU-BLASTX. We omitted ORFs labeled as dubious by the Saccharomyces Genome Database (SGD) that overlap other genes since any sequence similarity could be attributed to the overlapping genes. Matches to S. cerevisiae proteins with a P-value <10−5 and a WU-BLAST score of at least 200 were considered significant. The annotated contigs were compared to contigs of the same species to identify contig pairs containing multiple paralogs. Duplicated S. cerevisiae genes that are part of identified duplicated blocks of genes were treated as identical to increase the sensitivity of detecting paralogous genes in the different genome assemblies. S. castellii duplicated blocks are listed in supplemental Table 1 at http://www.genetics.org/supplemental/. Identification and analysis of the duplicated blocks were carried out with ad hoc Perl scripts.
The 130 duplicated sequence blocks of S. castellii (from 65 duplicated pairs) were compared to the 104 duplicated blocks (from 52 duplicated pairs) of S. cerevisiae to identify orthologous blocks. S. cerevisiae blocks that contain at least three homologs of each S. castellii block were identified and compared. Some large S. castellii blocks spanned two or more S. cerevisiae duplication blocks. The matching blocks were compared to determine which block was the most similar to the S. castellii block on the basis of the number of orthologous genes shared between the blocks. Only one S. cerevisiae block was assigned as orthologous to each S. castellii block, but since the S. castellii assembly is fragmented, several S. castellii blocks could be orthologous to nonoverlapping regions of a S. cerevisiae duplication block. Supplemental Table 2 (http://www.genetics.org/supplemental/) lists each of the 108 unambiguous orthologous blocks that we identified between the two species.
Identification of genes not present in S. castellii:
Annotated S. castellii contigs were compared to the complete list of S. cerevisiae genes obtained from the SGD. We took the complete list of S. cerevisiae genes not present in the automated annotations of S. castellii contigs (1852 genes) and determined whether they were annotated as Ty coding sequences, dubious ORFs, or duplicated genes unique to S. cerevisiae or if weakly homologous sequences were present in the genomes, but below our threshold for annotation.
Identification of genes not present in S. cerevisiae:
We identified all ORFs of at least 100 codons that do not overlap ORFs that are similar to S. cerevisiae genes. The 532 identified ORFs were compared to a nonredundant set of proteins in GenBank using BLASTP and matches of P-values <1e-20 were considered significant.
Identification of centromeric sequences:
Using Perl scripts we searched for centromere sequences in each species on the basis of known conserved sequence elements of S. cerevisiae centromeres. We searched for conserved DNA element (CDE)I (RTCACRTG), then for CDEII (an AT-rich sequence of at least 75 bp), and finally for CDEIII (TCCGA) (Olson 1991). The stringency of the search was also reduced by searching only for two of the three elements or by shortening the length of the conserved elements. Since C. albicans centromeres have a different structure than the simple S. cerevisiae centromeres, we searched for homologous sequences in S. castellii, using BLASTN. No significant matches were found.
Comparison of centromere-binding proteins:
Orthologs of known S. cerevisiae centromere-binding proteins (Cbf1, Cbf2, Cep3, Cse4, Ctf13, Mif2, and Skp1) were identified in S. kluyveri, S. castellii, K. waltii, and C. glabrata. The orthologous protein-coding sequences were compared, using WU-BLAST (BLASTP) with a postsearch Smith and Waterman alignment option. The protein sequences were also aligned with CLUSTALW to produce gene tree information.
Comparison of intron positions:
Sequence data from the SGD were used to determine the intron positions within the amino acid sequence of S. cerevisiae spliced genes. The genome sequences of the other Saccharomyces species were compared to S. cerevisiae spliced genes by TBLASTN. The Blast alignment output was parsed and compared to the location of the intron sites in S. cerevisiae proteins to look for intron loss or gain events or for changes in the location of the introns in the orthologs of S. cerevisiae spliced genes. Orthologous sequences with potential intronic differences were examined manually in ACEDB to identify possible splice signals such as 5′ and 3′ splice sites and intron branch junctions [GT(ATGT), YAG, and (T)ACTAAC respectively]. S. kluyveri and S. castellii genome sequences were also compared to S. cerevisiae proteins by TBLASTN to look for spliced genes in these species that are not spliced in S. cerevisiae. The output was parsed to show gaps within the protein-coding alignments of the translated homologous sequences that could be indicative of a spliced gene. Because of the genetic distance between these species, many breaks are present within the alignments, most of which are not due to introns. We manually inspected the parsed data and looked at interesting cases in more detail within ACEDB. However, because of the large number of gaps in the protein-coding alignments, these evaluations were not exhaustive. Therefore, other spliced genes are likely present in these genomes.
S. castellii, but not S. kluyveri, underwent a genome duplication:
To determine the extent of gene duplication in S. kluyveri (a petite-negative Saccharomyces species), S. castellii (a sensu lato species), and S. bayanus (a sensu stricto species), we evaluated the (incomplete) genome sequences of these species. The sequence contigs from each species were compared to each other to identify contig pairs that contain multiple paralogous protein-coding genes. We did not identify any S. kluyveri contig pairs with more than two similar genes (Table 1), suggesting that S. kluyveri has not undergone extensive segmental gene duplication. In contrast, 71 S. bayanus contig pairs and 65 S. castellii contig pairs contain at least three duplicated genes. The numbers of duplication blocks in the genomes of these species are similar to the 52 probable duplicated blocks identified in S. cerevisiae. Most of the extra duplication blocks result from similarity to the possible duplicated blocks in S. cerevisiae and from gaps in the (incomplete) sequences of the genomes of these species (inferred from sequence data linking sequence contigs into supercontigs). It should be noted that the 65 duplicated blocks we identified in S. castellii do not represent the full complement of duplicated regions of the genome, just as the 52 probable duplicated blocks in S. cerevisiae do not represent the full extent of its genomic duplication. The blocks represent merely regions where duplication is most evident.
The duplicated blocks of S. castellii and of S. cerevisiae have a common origin:
Since S. bayanus is so closely related to S. cerevisiae and exhibits such a high degree of synteny to the S. cerevisiae genome (Llorente et al. 2000), its duplicated blocks are undoubtedly derived from the same genome duplication event. We therefore focused on S. castellii and compared its 65 duplicated gene blocks to duplicated blocks in the S. cerevisiae genome to determine if they have a common origin. All of the 65 duplicated blocks of genes in S. castellii correspond to duplicated gene blocks in the S. cerevisiae genome. Conversely, all of the 52 S. cerevisiae blocks have a corresponding duplicated region of the S. castellii genome. Thus, there are no unique duplicated gene blocks in either genome. This supports the idea of a whole genome duplication, since new duplication blocks would be expected to have arisen after the divergence of these two species if they resulted from a series of segmental duplications. We conclude that the duplicated blocks have a common origin, despite the relatively large phylogenetic distance between S. cerevisiae and S. castellii. That is, S. cerevisiae and S. castellii speciated after the genome duplicated in the Saccharomyces lineage.
Orthology of S. cerevisiae and S. castellii duplication blocks:
We compared the gene content and gene order of each duplicated block in S. castellii to its paralogous block and to the most similar of the 52 duplicated blocks in S. cerevisiae. Most of the S. castellii duplicated segments are more similar to their orthologous duplicated sequence block in S. cerevisiae than they are to their paralogous block in S. castellii, on the basis of the number of homologous genes in the blocks (see Figure 1). Of the 130 duplicated S. castellii blocks (65 pairs of duplicated blocks), 108 can be unambiguously assigned to an ortholog among the 52 S. cerevisiae duplication blocks. Of these 108 duplicated blocks, 84 are more similar to their orthologous block in S. cerevisiae than to the paralogous block in S. castellii, 18 blocks are as similar to their orthologous block in S. cerevisiae as they are to the paralogous block in S. castellii, and only 6 S. castellii blocks are more similar to their paralogous block than to the orthologous block in S. cerevisiae. Half of these 6 S. castellii blocks and all but 2 of the 18 blocks would be judged more similar to the orthologous block in S. cerevisiae if genes surrounding the S. castellii or S. cerevisiae block were also considered, but in each case either the S. cerevisiae block is much shorter than the S. castellii block or the two blocks only partially overlap. Thus, it is likely that all of the S. castellii duplicated segments are more similar to segments of the S. cerevisiae genome than to their paralogous block. Therefore, the majority of genome changes (gene loss) following the duplication must have occurred before these two species diverged (Figure 2).
Comparison of duplicated gene pairs:
We identified 310 duplicated gene pairs in the 65 S. castellii duplicated blocks and an additional 239 duplicated gene pairs where at least one of the genes did not fall within the 65 duplicated blocks (compared to ∼500 duplicated gene pairs in S. cerevisiae). Many of this latter set of 239 duplicated genes are likely to be separated from their duplication block simply because of gaps in the genome sequence assembly or because of genome rearrangements that occurred since the divergence of these species. Over half of these 549 duplicated gene pairs (319) are also duplicated in S. cerevisiae; 230 are uniquely duplicated in S. castellii. Similarly, there is no evidence of duplication in S. castellii for 153 of the duplicated gene pairs in S. cerevisiae. Thus, less than half of the duplicated gene pairs are species specific, supporting the idea that speciation occurred well after the whole genome duplication (see Figure 2).
By adding the number of duplicated genes that are present in the two species we can estimate the number of duplicated genes that remained before the speciation event. A minimum of ∼700 (319 + 230 + 153 = 702) duplicated genes were still present prior to speciation. The maximum number of duplicated genes is more difficult to estimate since both lineages could have lost the same duplicated gene, but assuming a random loss of duplicated genes after speciation (i.e., the probability of a duplicated gene being lost in both species is equal to the product of the probabilities of the duplicated gene being lost in either species), we estimate that 812 genes were duplicated before speciation. This again supports the conclusion that most of the gene loss after the whole genome duplication occurred before these two species diverged from one another.
Table 2 shows functional classes of genes in which multiple duplicated gene pairs are more abundant in S. castellii than in S. cerevisiae. These are of interest because they may reflect pathways where new gene functions have arisen in S. castellii. For instance, S. castellii contains four genes encoding G1 cyclins compared to three in S. cerevisiae (for review see Breeden 2003). The extra gene originated from a duplication of CLN3 in S. castellii. In S. cerevisiae, CLN3 is located in a duplicated block, but is not duplicated. One notable set of duplicated genes in S. castellii is the GAL genes, encoding enzymes for galactose utilization (For review see Hittinger et al. 2004). In S. cerevisiae, GAL1 and GAL3 are paralogs derived from the whole genome duplication. GAL1 encodes a galactokinase; its paralog GAL3 encodes a protein that binds ATP and galactose but whose prime role is to interact with Gal80 and relieve inhibition of the Gal4 transcription factor by Gal80 in the presence of galactose (Peng and Hopper 2002). In S. castellii, GAL1 is duplicated, but both copies are more similar to S. cerevisiae GAL1 than to GAL3 (see Figure 3). S. castellii also contains a duplication of GAL7 (galactose-1-phosphate uridyl transferase). One of the duplicated blocks containing GAL7 in S. castellii is missing GAL10. The GAL1 and GAL7 genes in this cassette are most similar to their S. cerevisiae homologs in the GAL1–GAL10–GAL7 cassette and are flanked by SNQ2 in both species, suggesting that the GAL1–GAL7 genes in this S. castellii duplication block are orthologous to the genes in the S. cerevisiae GAL1–GAL10–GAL7 gene cluster.
Duplicated copies of additional galactose utilization genes have also been retained in S. castellii (but not in S. cerevisiae), including GAL4 and GAL80, which encode transcriptional regulators, and GAL11, which encodes a component of the mediator complex that interacts with RNA polymerase II and general transcription factors. The fact that S. castellii has more GAL genes suggests that its use of galactose may be more highly regulated than it is in S. cerevisiae or that it may have gained the ability to process other galactose-like molecules. These possibilities warrant further experimental analysis.
Another notable class of duplicated genes in S. castellii is the endoplasmic reticulum vesicle (ERV) gene set, which encodes proteins involved in ER-to-Golgi protein transport that are localized to COPII-coated vesicles. S. cerevisiae contains 6 ERV genes, 2 of which (ERV14 and ERV15) are a duplicate pair. S. castellii encodes 11 ERV genes, 10 of which are duplicated pairs, suggesting that S. castellii is more versatile than S. cerevisiae with regard to the processes of membrane fusion, vesicle formation, and delivery of specific cargo proteins to vesicles in which the Erv proteins are involved.
There are few gene classes for which duplicated gene pairs are more prevalent in S. cerevisiae than in S. castellii. One interesting case is succinate dehydrogenase (encoded by SDH1, SDH2, SDH3, and SDH4), a multisubunit enzyme that couples succinate oxidation to the transfer of electrons to ubiquinone. In S. cerevisiae, SDH1, SDH3, and SDH4 are duplicated, but SDH2 is not. In S. castellii, the opposite is true: SDH2 is duplicated, but SDH1, SDH3, and SDH4 are not.
Ribosomal protein-coding genes are the largest class of duplicated genes in both S. cerevisiae and S. castellii. S. cerevisiae contains 57 pairs of duplicated ribosomal genes, 52 of which lie in duplicated blocks of the genome. We identified 54 duplicated ribosomal protein gene pairs in S. castellii, 5 of which are not duplicated in S. cerevisiae (RPL25, RPL29, RPL32, RPL39, and RPS5). S. cerevisiae, on the other hand, may contain as many as 8 duplicated ribosomal gene pairs that are singletons in S. castellii (RPL1, RPL4, RPL7 RPL8, RPL11, RPL15, RPL35, and RPL41).
Genes not present in S. castellii:
Since the S. castellii genome is smaller than that of S. cerevisiae, we compared the gene content of the two species to determine if certain gene classes or metabolic functions are absent in the S. castellii genome. We identified 1852 S. cerevisiae genes that appear to be absent in S. castellii, but 792 of them are classified as “dubious” ORFs by the SGD and 84 are related to Ty retrotransposon sequences. Of the remaining 976 genes that appear to be missing in S. castellii, 153 can be explained by genes duplicated in S. cerevisiae that are single copy in S. castellii. Approximately 250 have weak similarity to sequences in S. castellii and could represent rapidly diverging genes or pseudogenes. Another set of ∼230 genes are members of gene families and have similarity to other proteins in S. castellii, so in these cases it appears that S. castellii has fewer members of these gene families. This leaves 340 S. cerevisiae genes that have no orthologs in the S. castellii sequence. S. castellii contains an additional 230 genes derived from the genome duplication that were not retained in S. cerevisiae. S. castellii may contain a few additional genes that are not in S. cerevisiae (see below), but overall S. castellii seems to have fewer genes.
Not surprisingly, the largest blocks of missing genes are located in the telomeric and subtelomeric regions of S. cerevisiae chromosomes. In these regions of the genome, it is not uncommon to find 5–10 consecutive genes that are missing in S. castellii. It is unlikely that all the missing telomeric genes are due to a cloning bias against telomeric DNA sequences or to our inability to assemble those sequences of S. castellii into contigs, because many S. cerevisiae telomeric genes are absent in the (complete) genome sequences of the related yeasts A. gossypii and K. lactis. Although telomeric repeat sequences are not common in the S. castellii sequence assembly, we identified 10 copies of the subtelomeric repeat YRF (for review see Louis and Haber 1992). Many of the S. cerevisiae telomeric genes may be absent in these diverged species or may have diverged beyond recognition, since the telomeric regions of the genome show a higher rate of sequence divergence (Kellis et al. 2003).
The S. cerevisiae telomeric gene families missing in the S. castellii genome (Table 3) include genes for biotin and thiamine synthesis and for maltose utilization, which explains known physiological deficiencies of S. castellii (Barnett et al. 2000). S. castellii also has a reduced number of aryl-alcohol dehydrogenase (AAD) genes, ferric iron uptake (FIT) genes, genes encoding ferric reductases (FRE), and COS and PAU (seripauperin) genes of unknown function.
Missing nontelomeric genes include three of four BNA genes, required for the synthesis of nicotinic acid from tryptophan, multiple FYV and KRE genes implicated in resistance to killer toxin, and half of the 10 PRM genes that encode pheromone-regulated transmembrane proteins required for membrane fusion during mating (see Table 3). S. castellii is also missing several alkaline phosphatase genes (PHO3, PHO5, PHO11, and PHO12) and PHO4, encoding a transcriptional regulator of PHO5, and PHO89, a Na+/Pi cotransporter gene (for a review of phosphate metabolism see Lenburg and O'Shea 1996). This is surprising, because duplicate copies of several genes encoding phosphate transporters, PHO84, PHO88, and the PHO87/PHO90 duplicate pair (the only PHO gene duplication in S. cerevisiae), are present in the S. castellii genome. Thus, there appear to be distinct differences in the way these two species utilize phosphate and regulate expression of genes for this process.
S. castellii genes not present in S.cerevisiae:
Despite its smaller genome size, there are a number of S. castellii genes that are not found in S. cerevisiae. We identified 532 ORFs (of at least 100 codons) in this category, most of which are unlikely to be genes [303 (57%) are <150 codons], but there is some reason to believe that at least 41 ORFs encode functional proteins since they match known or hypothetical proteins of other organisms in GenBank (Table 4). Two gene families—one related to quinone reductase, the other related to pirin, a highly conserved nuclear protein of unknown function that is found in animals, plants, fungi, and bacteria—account for eight of the matches to GenBank proteins. Remarkably, no copies of pirin are found in the sensu stricto Saccharomyces species; one copy is found in S. kluyveri.
A total of 147 of the 532 ORFs unique to S. castellii are similar to other hypothetical ORFs in S. castellii. These include a group of subtelomeric repeats (∼16) that are often found near Y′ elements. There are 18 additional S. castellii gene families (of three or more genes) with no similarity to proteins in GenBank. One of the largest of these gene families, consisting of at least 10 members, encodes putative proteins of ∼800 amino acids. One homolog of this family is encoded in the S. kluyveri genome. Another six-member gene family encodes hypothetical proteins between 650 and 700 amino acids in length.
S.castellii chromosomes and centromeres:
Although S. castellii and S. cerevisiae appear to have diverged well after the same whole genome duplication event, S. castellii contains only about half as many chromosomes as S. cerevisiae (9 compared to 16) (Petersen et al. 1999). To investigate the fate of the duplicated centromeres in S. castellii, we searched the S. castellii genome for centromere sequences. First we searched for instances of CDEI (RTCACRTG), then for CDEII, an AT-rich sequence of at least 75 bp, and finally for CDEIII (TCCGA). In this way, we were able to identify all 7 centromeres in S. kluyveri, 7 of 8 centromeres in K. waltii, and 9 of 13 centromeres in C. glabrata, but we found no centromere sequences in S. castellii, even after reducing the stringency of the search. In an attempt to identify the centromeres by synteny we identified the S. castellii homologs of S. cerevisiae genes that flank centromeres. Five of the gene pairs flanking S. cerevisiae centromeres are also adjacent to one another in S. castellii, but no centromeric sequences are apparent between them. In fact, the intergenic regions between the S. castellii orthologs of S. cerevisiae genes that flank centromeres are strikingly short: an average of 313 bp, compared to an average of 1342 bp in S. cerevisiae and 1416 bp in S. kluyveri. This suggests that centromere sequence and location in S. castellii have diverged significantly from that of S. cerevisiae, which is surprising given the conservation of centromeres in the species more distantly related to S. cerevisiae.
In light of this, it is notable that the S. castellii centromere-binding proteins seem to be diverging more rapidly than orthologous sequences in the other related yeast species. For instance, of seven known centromere-binding proteins (Cbf1, Cbf2, Cep3, Cse4, Ctf13, Mif2, and Skp1) only two S. castellii proteins (Ctf13 and Skp1) are more similar to their S. cerevisiae orthologs than the S. cerevisiae proteins are to their orthologs from the more distantly related S. kluyveri. In three cases (CBF1, CBF2, and CSE4), the centromere-binding proteins encoded in the K. waltii genome are more similar to their S. cerevisiae orthologs than are the S. castellii orthologs.
S. castellii contains fewer introns than do S. cerevisiae and S.kluyveri:
S. cerevisiae contains relatively few introns. Evidence suggests that introns and spliceosomal components have been lost in the S. cerevisiae lineage (Rymond and Rosbash 1992). A paucity of introns has also been observed in other hemiascomycetous yeast (Bon et al. 2003), suggesting that the loss of introns is characteristic of this lineage. To learn more about the evolutionary fate of introns in the Saccharomyces yeasts, we looked for intron loss events with respect to the S. cerevisiae genome. As expected, we did not find any differences in intron number or location between S. bayanus and S. cerevisiae, nor did we find evidence of lost introns in S. kluyveri [one intron loss event has previously been reported for S. kluyveri (Bon et al. 2003), but there was not sufficient evidence in our assembly to confirm this]. However, we identified two extra introns in S. kluyveri genes that are orthologs of S. cerevisiae spliced genes (YKL002w and YPL109c). In contrast to what we found in S. bayanus and S. kluyveri, we identified 22 S. castellii genes that appear to have lost introns (see Table 5). Since these introns are in the other Saccharomyces species, the losses are specific to the S. castellii lineage. We also looked for introns in genes that are not spliced in S. cerevisiae. We found good evidence for 13 additional spliced genes in S. kluyveri and 2 additional spliced genes in S. castellii. We discovered another 8 genes that appear to be spliced in S. kluyveri, but where the sequence evidence alone was not conclusive. Thus, additional spliced genes are likely to be present in these genomes. In summary, S. castellii has fewer spliced genes than does S. cerevisiae (and the other sensu stricto species), which has fewer spliced genes than S. kluyveri.
The whole genome duplication event that occurred in the evolution of the Saccharomyces genus preceded the divergence of S. cerevisiae and S. castellii, but occurred after S. kluyveri diverged from the other Saccharomyces species (see Figure 2). Perhaps this is not surprising given the likelihood that S. kluyveri is a Kluyveromycete (Johnston et al. 1988; Llorente et al. 2000) and the fact that S. kluyveri has roughly half as many chromosomes as the sensu stricto Saccharomycetes (Petersen et al. 1999) [although S. castellii also has a low chromosome number (eight to nine) and a small genome size, despite having clearly undergone the genome duplication event].
S. castellii's position as the earliest branching species in the Saccharomyces phylogeny suggests that the other Saccharomyces species also contain duplicated genomes. This is consistent with chromosomal numbers for the other species [16 for sensu stricto species and for S. exiguus, 12–14 for S. servazzii and S. unisporus (but only 8–9 for S. dairenensis, the closest known relative of S. castellii)] (Petersen et al. 1999) and with analysis of genomic survey sequences of several Saccharomycete species (Wong et al. 2002).
Comparison of the S. castellii and S. cerevisiae genome sequences reveals the fate of genes after a whole genome duplication (see Figure 2). The majority of gene loss in this case appears to have occurred before the speciation of the two yeasts. This view is supported by the similarity of gene order between the duplication blocks of each species and by the small number of duplicated gene pairs that are present in each species. We estimate that only ∼800 duplicated genes pairs were present at the time of this event (assuming that the loss of duplicated gene pairs was random after speciation). Almost 40% of these are present in both species. Several other notable changes occurred after speciation: genome rearrangements reduced the number of chromosomes from 16 to 8 or 9 in S. castellii, and S. castellii lost a considerable number of genes (400–500) that S. cerevisiae retained. Perhaps these gene losses occurred during the chromosomal rearrangements that condensed the number of chromosomes in S. castellii.
The Saccharomyces species provide an opportunity to investigate the consequences of genome duplication on the several evolutionarily stable clades that resulted from this event. S. cerevisiae retained ∼11% of the duplicated genes, a number that has been postulated to be a normal outcome of whole genome duplication (Lynch and Conery 2000; Wagner 2001). However, S. exiguus may have retained a greater number of duplicated genes, since it seems to have 16 chromosomes and a genome size estimated to be 5 Mb larger than that of S. cerevisiae (Petersen et al. 1999). On the other end of the spectrum, C. glabrata exhibits only a small fraction of the genetic redundancy found in S. cerevisiae (Dujon et al. 2004). Our comparison of S. cerevisiae and S. castellii indicates that there are many important biological differences between these two duplicated species. The phylogenetic distance between these two species and other duplicated sensu lato species suggests that other important evolutionary changes are present in the other members of this group. One important implication of this situation is that new gene functions may be unique to a species. In this respect, species like S. kluyveri that possess less genetic redundancy may be a good models for studying basic cellular functions that are present in a wide range of eukaryotic cells.
↵1 Present address: Department of Biology, Utah State University, 5305 Old Main Hill, Logan, UT 84322.
Communicating editor: B. J. Andrews
- Received July 29, 2005.
- Accepted November 8, 2005.
- Copyright © 2006 by the Genetics Society of America