To find the most rapidly evolving regions in the yeast genome we compared most of chromosome III from three closely related lineages of the wild yeast Saccharomyces paradoxus. Unexpectedly, the centromere appears to be the fastest-evolving part of the chromosome, evolving even faster than DNA sequences unlikely to be under selective constraint (i.e., synonymous sites after correcting for codon usage bias and remnant transposable elements). Centromeres on other chromosomes also show an elevated rate of nucleotide substitution. Rapid centromere evolution has also been reported for some plants and animals and has been attributed to selection for inclusion in the egg or the ovule at female meiosis. But Saccharomyces yeasts have symmetrical meioses with all four products surviving, thus providing no opportunity for meiotic drive. In addition, yeast centromeres show the high levels of polymorphism expected under a neutral model of molecular evolution. We suggest that yeast centromeres suffer an elevated rate of mutation relative to other chromosomal regions and they change through a process of “centromere drift,” not drive.
COMPARISONS of genome sequences among species allow detailed analyses of the mode and tempo of evolution at the molecular level (Schein et al. 2004; Chimpanzee Sequencing and Analysis Consortium 2005; Shapiro et al. 2007). Comparisons of closely related species are especially needed to identify and analyze the fastest-evolving regions of genomes without ambiguities of homology or uncertainties due to multiple substitutions. Here we study the evolution of an entire chromosome (excluding telomeres and subtelomeres) by comparing sequences from three closely related and phylogenetically independent lineages of the wild yeast Saccharomyces paradoxus.
Yeasts provide an excellent model system for comparative studies in genome evolution, as they have small genomes, dense with genes and regulatory elements, and complete genome sequences are now available for a number of species (Goffeau et al. 1996; Cliften et al. 2003; Kellis et al. 2003, 2004; Dujon et al. 2004; Liti and Louis 2005). The closest relatives sequenced thus far, S. paradoxus and S. cerevisiae, however, are 13% divergent at the nucleotide level (Kellis et al. 2003), and many intergenic regions are difficult to align due to extensive insertions/deletions and ambiguities introduced by multiple substitutions. S. paradoxus strains from Europe, Far East Asia, and Brazil (also known as S. cariocanus) represent three genealogically independent populations that show partial hybrid sterility and much lower sequence divergence [1.5% divergence between Europe and the Far East and 5% between either one and S. cariocanus (Greig et al. 2003; Koufopanou et al. 2006; Liti et al. 2006)]. These three populations are ideal for population genomic studies, as they provide independent replicates for testing the repeatability of evolutionary patterns. Moreover, S. paradoxus is sufficiently closely related to S. cerevisiae that its genome can be annotated by homology, allowing full use of the vast amounts of information on S. cerevisiae. Finally, S. paradoxus has never been domesticated, and results will therefore reflect natural rather than artificial processes caused by human interventions.
MATERIALS AND METHODS
To measure divergence we sequenced most of chromosome III from one Far East strain of S. paradoxus (CBS 8442) and the Type strain of S. cariocanus (CBS 8841) and compared these to the published sequence for the European Type strain of S. paradoxus (CBS 432) (Kellis et al. 2003). For polymorphism, we used 11 more European strains from Berkshire in the United Kingdom (T18.2, T26.3, T32.1, T62.1, T68.2, T76.6, Q4.1, Q6.1, Q14.4, Q15.1, and Q43.5) (Johnson et al. 2004) and 7 more Far East strains (CBS 8436, CBS 8437, CBS 8438, CBS 8439, CBS 8440, CBS 8441, and CBS 8444) (Naumov et al. 1997; Koufopanou et al. 2006). All strains were made fully homozygous prior to sequencing by isolating a single spore from a tetrad and allowing it to autodiploidize.
DNA sequencing, assembly, alignment, and annotation:
DNA sequence was obtained by a PCR-based strategy. The published S. paradoxus chromosome III sequence was used as a reference to design primers for PCR amplification and sequencing of the nontelomeric fraction of the chromosome from total genomic DNA. The PCRs generated overlapping 2-kb fragments, which were then sequenced with internal primers. Base-calling of DNA sequence traces was conducted using Phred (Ewing and Green 1998), and sequences were assembled using the Gap4 component of Staden (http://staden.sourceforge.net/). The 14 genes and adjacent intergenes used to estimate polymorphism levels included MRC1, SPB1, YCL045C, ATG22, ILV6, CIT2, PGK1, MAK32, FEN2, PER1, CTR86, HCM1, YCR072C, and KIN82. To ensure a high degree of confidence in the polymorphism data, only bases with a consensus Phred quality score ≥q40 were accepted (probability of miscall <1/10,000), the rest being treated as missing data. DNA sequences have been deposited in GenBank (accession nos. EU444725, EU444726, and EU444121–EU444533).
Sequences were aligned against the published sequences for chromosome III of S. cerevisiae (October 1, 2003 version: ftp://genome.cse.ucsc.edu/goldenPath/sacCer1/bigZips/chromFa.zip) and S. paradoxus (http://www.broad.mit.edu/ftp/pub/annotation/fungi/comp_yeasts/S1a.Assembly/), using mlagan, and further improved manually using SeaView and BioEdit (Galtier et al. 1996; Hall 1999; Brudno et al. 2003). Annotations of the S. cerevisiae chromosome (http://hgdownload.cse.ucsc.edu/goldenPath/sacCer1/database/) were transferred to the alignment using custom scripts. Sequence that aligned to the right of position 270,757 on S. cerevisiae chromosome III was excluded due to uncertainty in the orthology of the sequence. We also excluded six PCR fragments generated by pairs of primers that were predicted to amplify paralogous sequences on other chromosomes, using in silico PCR (isPcr) (settings −minPerfect = 1; http://www.cse.ucsc.edu/∼kent/src/) and the published S. paradoxus genome sequence. Long terminal repeat (LTR) regions were identified using RepeatMasker (version open-3.0); the repeat library included all sequences for Saccharomyces yeasts in RepBase 9.11 and the S. cerevisiae Ty4 retrotransposon. Only fixed LTRs were included in the analyses, to exclude recent inserts that would not be comparable to the rest of the chromosome.
Divergence and nucleotide diversity were estimated using polydNdS (http://molpopgen.org/) (Thornton 2003), and VariScan (Vilella et al. 2005). We do not correct for multiple hits, and insertions, deletions, and missing or ambiguous data are ignored. To remove the effect of codon usage bias from our estimates of synonymous divergence, we used the measures of codon bias (c) for each gene in Hirsh et al. (2005), calculated from several Saccharomyces species, including S. paradoxus.
Divergence along chromosome III:
We sequenced ∼295 kb of the Far East chromosome, representing ∼91% of the complete S. cerevisiae chromosome, and ∼250 kb of the S. cariocanus chromosome (∼76%). The overall nucleotide divergence between the European and the Far East chromosomes is 1.4% (about equal to that between humans and chimpanzees; chimpanzee sequencing and analysis consortium 2005); the divergence of either one from S. cariocanus is ∼4%. Levels of divergence vary significantly along the length of the chromosome; surprisingly, the greatest divergence is at the centromere (Figure 1).
Divergence of other centromeres:
To test whether the elevated divergence applies to other centromeres, we sequenced the centromeres of four additional chromosomes (CEN5, CEN7, CEN9, and CEN15). The rate of divergence does not differ significantly among centromeres (G-tests: the P-value for the Europe–Far East comparison is PEF = 0.26; that for European S. paradoxus–S. cariocanus is PEC = 0.41, with comparable values for Far East–S. cariocanus here and throughout the article). All five centromeres show high levels of divergence compared to other types of DNA (Table 1, Figure 2; see also supplemental Table 1).
One possible cause of this high divergence is that centromeres on the same chromosome in different lineages are not orthologous, but instead have been transferred by gene conversion from some other chromosome. To test this possibility we aligned all the sequences and constructed a phylogeny. This shows the pattern expected from orthology: centromeres from the same chromosome cluster together, with Europe and the Far East more closely related to each other than either is to S. cariocanus (Figure 3).
Centromeres in Saccharomyces yeast are very short (∼120 bp long) and well defined and consist of three functionally distinct regions: two protein-binding sites [centromere DNA elements (CDE)I and CDEIII, 8 bp and ∼25 bp long] and a highly AT-rich spacer region separating them (CDEII, ∼90 bp long) (Clarke 1998). CDEII wraps around the centromere-specific histone Cse4; this binding is analogous to that between mammalian centromeric repeats and CENPA (the mammalian homolog of Cse4), although in the case of yeast only a single nucleosome is formed for each centromere (Sullivan et al. 2001). CDEIIs from different chromosomes are highly dissimilar (up to 60% differences among those sequenced here) yet functionally interchangeable (Clarke and Carbon 1983), indicating that the binding of CDEII to the centromere-specific histone Cse4 is not sequence specific, although changes in the length, AT content, and pattern of runs of A's and T's can disrupt centromere function, perhaps by altering DNA bendability or flexibility (Baker and Rogers 2005). CDEII diverges more than twice as fast as the two binding sites (Figure 4 and Table 1; G-test, PEF = 0.001, PEC = 2 × 10−11), which do not differ significantly from each other (P > 0.4). CDEII also diverges about twice as fast as the 85-bp regions immediately flanking the two binding sites (data not shown), suggesting the effect is specific to the centromeres. In the remainder of this article we focus on the fast-evolving CDEII component of centromeres.
Comparison with sequences likely to be evolving neutrally:
An obvious question is whether CDEII regions are evolving faster than selectively neutral sequences. One class of DNA likely to show little selective constraint is synonymous sites in genes with low codon bias (Akashi 2001; Fay and Benavides 2005). Synonymous sites as a whole diverge only a third as fast as CDEII (Wilcoxon tests: PEF = PEC = 0.0002). To estimate the rate of divergence of synonymous sites in the absence of codon bias, we calculated the regression of divergence against (1 − CAI), where CAI is the codon adaptation index of Hirsh et al. (2005) (Figure 5). Regression lines were forced through the origin (i.e., CAI = 1), on the grounds that complete bias should result in no divergence. The estimated divergence in the absence of bias (CAI = 0) is 50% higher than the observed synonymous divergence, but still only half the value for CDEII (Table 1).
Other DNA sequences likely to show little selective constraint are the remnant LTR regions of partially deleted transposable elements, as these are no longer functional. These sequences diverge at about the same rate as synonymous sites after codon bias correction, but again less than half as fast as CDEII (Wilcoxon tests: PEF and PEC < 0.003; Table 1). Divergence at other regions (intergenes and nonsynonymous sites) is lower still (Table 1).
Polymorphism, selection, and mutation:
For CDEII to diverge faster than selectively neutral sequences it must experience an elevated mutation rate or recurrent positive selection. To attempt to distinguish these possibilities we measured levels of polymorphism segregating in each of the European and Far East populations. If mutation rates are elevated, then polymorphism at CDEII will also be elevated, proportional to divergence, whereas if there has been recurrent positive selection, then polymorphism will be reduced (Hudson et al. 1987). We analyzed polymorphism at the five centromeres and their flanking intergenic regions, the LTRs, and 14 genes and adjacent intergenes along chromosome III, in 12 European and 8 Far Eastern strains (see supplemental Table 2). To minimize linkage effects the 14 genes were chosen so that they are at least 4 kb apart. We compared average π-values for the two populations between different classes of DNA using a Wilcoxon test and found that CDEIIs are more polymorphic than LTRs or synonymous sites, although with borderline statistical significance (P = 0.08 for both comparisons). These analyses are somewhat limited because the CDEII region is so short and we are unable to accurately assess the extent of heterogeneity in π for centromeres on different chromosomes. π ranges from 0 to 1.6%, (Figure 2), but this variation is no more than expected by chance [P > 0.3 for both Europe and Far East; tested by simulating 10,000 data sets with the observed average π (0.0055/bp or 0.0079/bp), number of sequences (12 or 8), and sequence length (85 bp)]. These limitations notwithstanding, there is no evidence for the reduced polymorphism expected in a simple model of recurrent positive selection and the ratio of polymorphism to divergence for CDEII is not significantly different from that of LTRs or synonymous sites, either by a Wilcoxon test (P > 0.4) or in a coalescent-based maximum-likelihood analysis [MLHKA test, P > 0.2 (Wright and Charlesworth 2004); Figure 2].
Analysis of other AT-rich regions:
CDEII is extremely AT rich, averaging 90% for the five centromeres studied here. To test whether other AT-rich regions have elevated divergence, we scanned the published chromosome III sequence for the 10 most AT-rich regions of similar length to the CDEII region [85-bp-long regions, 86–94% AT; in cases where many consecutive windows showed the same proportion of A or T sites the first (left) window was selected; all regions found were >4 kb apart]. These show much lower divergence and polymorphism than CDEII, similar to other intergenic regions (Table 1). Thus AT-rich regions are not fast evolving in general, and AT richness in itself cannot account for the high divergence we observe.
We have compared near-complete sequences of chromosome III from three closely related lineages of yeast and found that the fastest-evolving sequence is the CDEII region of the centromere. Further analyses indicate that centromeres on other chromosomes also evolve rapidly, and even more rapidly than sequences likely to be evolving neutrally. To our knowledge, ours is the most detailed sequence analysis of centromere evolution thus far, made possible by the small size of yeast centromeres. Our analysis specifically points to centromeres having an elevated rate of nucleotide substitution rather than, say, more frequent repeat expansions and contractions.
Rapid centromere evolution has also been observed in some plants and animals (Haaf and Willard 1997; Henikoff et al. 2001; Lee et al. 2005; Ma et al. 2007; Ventura et al. 2007). In these taxa centromeres are extremely long and complex stretches of highly repetitive AT-rich satellite DNA, usually surrounded by or embedded in heterochromatin (Clarke 1998; Henikoff et al. 2001; Sullivan et al. 2001). Rapid centromere evolution in plants and animals has been attributed to recurrent positive selection for mutant repeats that distort their segregation at female meiosis, somehow orienting toward the spindle pole that will contribute to the egg or ovule and away from the polar bodies (the “centromere drive hypothesis”; Henikoff et al. 2001; Malik and Henikoff 2002). Saccharomyces yeasts have symmetrical meioses, with all four products being viable, and selection for this type of segregation distortion is not possible. Thus some other factor(s) must be responsible.
The two most obvious possible causes for rapid evolution of yeast centromeres are an elevated mutation rate and/or recurrent positive selection (due to some factor other than drive). Distinguishing these two possibilities is not always easy. Centromeres appear to be more polymorphic than neutral regions of the chromosome (with borderline statistical significance) and the ratio of polymorphism to divergence is not different. Thus the data are consistent with centromeres having an elevated rate of mutation, and this seems to us the most likely explanation (although we are not able to exclude some complex forms of selection, such as a combination of directional and balancing selection). Due to the small size of CDEII we were not able to accurately assess the extent of heterogeneity in polymorphism among centromeres on different chromosomes; more data from more isolates and centromeres would be useful in this regard. Interestingly, the two proteins directly interacting with CDEII, Cse4 and Mif2 (homologous to the fast-evolving mammalian CENPC protein), appear to be under strong purifying selection in yeast, with no indication of recurrent directional selection (Henikoff and Dalal 2005; Baker and Rogers 2006).
If yeast centromeres do suffer an elevated mutation rate, it is not clear what might be causing it. Recombination may be mutagenic (Bussell et al. 2006), but genealogical analysis of the five centromeres and flanking intergenes shows no evidence of recombination in any of these regions in either population (data not shown). CDEII is extremely AT rich, and runs of A's and T's can lead to insertion and deletion mutations by replication slippage (Levinson and Gutman 1987), but our analysis includes only base substitutions, not indels. Both polymorphism and divergence are sufficiently low to exclude the possibility of artificially inflated mutation rates due to misalignments (Figure 4 and supplemental Figure 1). In addition, other AT-rich regions do not show elevated divergence or polymorphism. Perhaps the structure of centromeres is such that CDEII is more exposed to damaging agents (e.g., free radicals) in the nuclear environment or less exposed to repair enzymes. It is also possible that there are occasional small-scale gene-conversion events from other centromeres, too short to be detected in our analysis of orthology, and that these contribute to the observed divergence.
An increased mutation rate need not affect all nucleotides equally. If C's and G's were disproportionately affected, then this could account in part for the A/T compositional bias of CDEII—indeed, if C/G is nine times more mutable than A/T, then a neutral sequence would evolve to be 90% A/T. Moreover, a 9-fold increase in the mutability of C/G would also produce a 1.8-fold increase in the rate of divergence at equilibrium (Haddrill et al. 2005). On the other hand, the A/T bias appears to be functionally important (Baker and Rogers 2005), and it could be maintained purely by selection, without a mutation bias. In this case there would be purifying selection against mutations to C or G, and the actual mutation rate at CDEII would be even higher than the divergence rate we observe.
To conclude, we find that centromeres are the fastest-evolving regions in the yeast genome (possibly excluding telomeres and subtelomeres), despite their essential and conserved role in chromosome segregation. Our results also indicate that rapid centromere evolution can occur in the absence of drive and instead point to elevated mutation rates as a possible explanation. Other Saccharomyces lineages also show relatively high rates of centromere evolution (D. Barton and E. Louis, unpublished results). The centromere divergence we observe may have no functional consequences, although experimental transfers between species would be the best way to test this idea. Elevated mutation rates should also be considered as a possible contributor to rapid centromere evolution in plants and animals. Centromeres in these taxa are determined epigenetically rather than by DNA sequence, and this could be an adaptation to maintain function in the face of unavoidably high mutation rates (Murphy and Karpen 1998).
We thank David Barton and Edward Louis for useful discussions and for showing us their manuscript prior to publication. We also thank Jason Tsai for help with the analysis and Casey Bergman, Brian Charlesworth, Harmit Malik, Mick Crawley, Daniela Delneri, and two anonymous reviewers for helpful comments and discussions. DNA sequencing was done by AGOWA (Berlin). This work was funded by the Biotechnology and Biological Sciences Research Council (to V.K. and A.B.) and a Wellcome Trust VIP award (to D.B.).
↵1 Present address: Faculty of Life Sciences, University of Manchester, Oxford Rd., Manchester M13 9PT, United Kingdom.
↵2 Present address: Department of Entomology, The Natural History Museum, Cromwell Rd., London SW7 5BD, United Kingdom.
Communicating editor: D. Begun
- Received October 30, 2007.
- Accepted February 8, 2008.
- Copyright © 2008 by the Genetics Society of America