Saccharomyces paradoxus is the closest known relative of the well-known S. cerevisiae and an attractive model organism for population genetic and genomic studies. Here we characterize a set of 28 wild isolates from a 10-km2 sampling area in southern England. All 28 isolates are homothallic (capable of mating-type switching) and wild type with respect to nutrient requirements. Nine wild isolates and two lab strains of S. paradoxus were surveyed for sequence variation at six loci totaling 7 kb, and all 28 wild isolates were then genotyped at seven polymorphic loci. These data were used to calculate nucleotide diversity and number of segregating sites in S. paradoxus and to investigate geographic differentiation, population structure, and linkage disequilibrium. Synonymous site diversity is ∼0.3%. Extensive incompatibilities between gene genealogies indicate frequent recombination between unlinked loci, but there is no evidence of recombination within genes. Some localized clonal growth is apparent. The frequency of outcrossing relative to inbreeding is estimated at 1.1% on the basis of heterozygosity. Thus, all three modes of reproduction known in the lab (clonal replication, inbreeding, and outcrossing) have been important in molding genetic variation in this species.
MANY fields in biology have progressed by the concentrated study of a select group of model systems. In population and evolutionary genetics, only a few species such as Drosophila and humans have been widely adopted, and it might make sense to consider what other taxa might best complement these. The yeast Saccharomyces cerevisiae has a number of characteristics that would seem to make it ideal (Zeyl 2000): (i) It is already a well-studied model system in biochemistry, cell biology, classical genetics, and molecular biology; (ii) genomes can be precisely altered by homologous recombination; and (iii) long-term experiments with large population sizes and sensitive fitness assays are readily possible in the laboratory. These features suggest that one may be more likely to be able to investigate and interpret the functional significance of natural DNA sequence variation in this species than in any other eukaryote. Moreover, it has a relatively small and gene-rich genome, reducing the size of the problem to be solved. However, there is a problem: S. cerevisiae has long been associated with humans, and in collecting strains it is difficult to determine to what extent they are escaped domestics or otherwise greatly affected by human activity (Vaughan-Martini and Martini 1995; Naumovet al. 1992a). This could greatly affect their population genetics, severely complicating interpretations and reducing the extent to which lessons learned with this species are likely to be widely applicable. For example, one survey of S. cerevisiae in wineries revealed some surprising findings, including 31% of strains heterozygous for a lethal mutation and 23% heterozygous or homozygous for heterothallism, i.e., an inability to undergo mating-type switching (Mortimer 2000). The association between Drosophila and humans has posed similar problems (Andolfatto and Przeworski 2000; Wallet al. 2002).
One way to circumvent this problem would be to study a close relative that has the same advantages, but not the disadvantage. S. paradoxus is (along with S. cariocanus) the closest known relative of S. cerevisiae (Goddard and Burt 1999). The two species appear to be biochemically indistinguishable (Barnettet al. 1990), have the same chromosome number, and appear to be largely syntenic (Naumovet al. 1992b). Growth preferences in the lab are the same as for S. cerevisiae, and genetic engineering by the same homologous gene replacement methods used in S. cerevisiae is possible (E. Louis, personal communication). Thus, many of the advantages still apply. Moreover, it has been isolated from many natural locations worldwide (e.g., Sniegowskiet al. 2002) and apparently has not been widely domesticated. Gene flow between S. cerevisiae and S. paradoxus is also unlikely; hybrids can be formed, but are almost completely sterile (Naumovet al. 1997a). Overall DNA sequence divergence between the two species is thought to be ∼20% (Herbertet al. 1988), and synonymous site divergence at the loci studied here is ∼30%.
In the laboratory, the life cycle of S. paradoxus is the same as that of S. cerevisiae (Herskowitz 1988). It normally reproduces mitotically as a diploid, but when starved of nitrogen undergoes meiosis and produces four haploid spores encapsulated in an ascus. There are two mating types, and the spores usually mate within the ascus upon germination, but if this does not happen, they are able to reproduce mitotically as haploids. Haploid cells are constitutively ready to mate and can out-cross. However, haploid mitoses are associated with a sophisticated mechanism of mating-type switching, with the result that cells can also mate with their clonemates, producing an entirely homozygous diploid (“autodiploidization”). Thus, S. paradoxus may undergo two types of self-fertilization: intra-ascus mating and autodiploidization. For a review of ascomycete mating systems, see Nelson (1996).
In this article we describe a preliminary investigation into the genetics of a single population of S. paradoxus, focusing on quantifying levels of nucleotide variation and analyzing the pattern of variation to infer mating system (and, to a lesser extent, dispersal).
MATERIALS AND METHODS
Collections: S. paradoxus was isolated from the bark of oak trees (Quercus, mainly Quercus robur; Naumovet al. 1998) in Silwood Park and Windsor Great Park. Bark scrapings (∼1 g) were collected from 86 oak trees on each of two dates, with two scrapings on opposite sides of the tree on each date. Scrapings were aseptically transferred to acidified malt medium [5% malt extract (Sigma, Dorset, UK), 0.4% lactic acid (Sigma) w/v] in loosely capped vials and shaken for 2 days at 30°. Many types of microbe were present in the medium so a selection procedure was incorporated to isolate S. paradoxus. Dilutions of the 48-hr culture were plated on acidified malt and incubated for 24 hr at 30°. The resulting colony-forming units were visually inspected and colonies looking like S. paradoxus were picked, placed on YPD [1% yeast extract (Merck, Dorset, UK), 2% peptone (Merck), 2% glucose (BDH, Leices-tershire, UK], and then subsamples were tested for their ability to form tetrads when placed upon nitrogen-starving medium (2% potassium acetate; BDH). Heterozygosity was maintained in the original samples because they were not stimulated to sporulate. For those that formed tetrads, the internal transcribed spacer region (ITS1-5.8rRNA-ITS2) was amplified using primers ITS1 and ITS4 (Whiteet al. 1990) and then visualized via electrophoresis through 1% agarose. ITS amplicons of roughly the correct size were sequenced (with an ABI 373) and compared to the ITS sequences from the S. paradoxus (CBS 432) and S. cerevisiae type strains. Three types of sequence were recovered. Two of these were largely unalignable to the Saccharomyces sequences and were identified as Hanseniaspora osmophila (CBS 313) and Torulaspora delbrueckii (CBS 404), using BLAST (Altschulet al. 1990). All sequences in the third category were very similar to the S. paradoxus sequence and were included in our sample. Our procedure therefore allowed the isolation of both S. paradoxus and S. cerevisiae strains with substantial variability within each species. The initial collection of 344 bark scrapings yielded 28 isolates.
Other strains: The Centraalbureau voor Schimmelcultures (CBS) supplied CBS 432, the type strain of S. paradoxus, and the Danish lab strain CBS 5829, here referred to as “Type” and “Danish,” respectively.
Two S. paradoxus isolates from the Russian Far East (FE), CBS 8436 and CBS 8444, were included for comparison. These isolates differ from European S. paradoxus at allozyme loci (Naumovet al. 1997b) and show ∼5% synonymous site divergence from the type strain of S. paradoxus at the six sequenced loci. These strains, referred to herein as FE1 and FE2, respectively, were kindly provided by Edward Louis. All S. cerevisiae sequence data were from the Yeast Genome Project (Goffeauet al. 1996).
Phenotypic assays: To isolate individual spores for phenotypic assays, all wild isolates were grown on sporulation medium for 4 days, and resultant asci were enzymatically digested (10 min in a 50-μl solution of 10 mg/ml sulfanotase, 10 mg/ml lyticase at 25°). Individual spores were removed with a Zeiss micromanipulator and incubated at 25° for 4 days on YPD agar to allow colony growth. Colonies were replica plated to minimal and sporulation media and after 3 days examined for growth or surveyed by microscopy for the presence of tetrads. The presence of tetrads was considered indicative of mating-type switching. All media were made according to Sherman (1991).
Molecular methods: Nine wild isolates were chosen randomly for an initial survey of sequence variation. Total DNA was extracted (Sherman 1991) and diluted 100-fold for use as a PCR template. Six genes involved in mate recognition were amplified from the nine wild isolates and from the Type strain, Danish, FE1, and FE2 isolates. Details of genes and primers are given in Table 1. All 28 wild isolates were then genotyped at polymorphic sites by restriction at the MFA1 and AGA2 loci, using enzymes Tsp451 and AseI, respectively, and by sequencing fragments of MFα1, SAG1, STE2, and STE3.
Microsatellite locus: Twenty S. cerevisiae microsatellite primer pairs (Field and Wills 1998) were tested on S. paradoxus. Of these only 3 gave a PCR product with S. paradoxus, and 1 was found to be polymorphic, a variable-length repeat in the TFA1 gene (chromosome XI in S. cerevisiae). The wild isolates were genotyped at this locus by polyacrylamide gel electrophoresis of radioactively end-labeled PCR products (Sambrooket al. 1989). A representative of each mobility group was sequenced to determine the length of each allele.
Statistical analysis and software used: Nucleotide diversity π at synonymous and nonsynonymous sites, and synonymous site divergence, were calculated using DnaSP (Rozas and Rozas 1999; available at http://www.ub.es/dnasp/). Parsimony analysis of gene trees and comparisons among them by the partition homogeneity test (Farriset al. 1994) were performed using PAUP (Swofford 2002). To test for deviations from neutrality, we compared the variance of branch lengths on the genealogy to that from 1000 random genealogies with the same total branch length, constructed using N. Barton's genealogies package (available at http://helios.bto.ed.ac.uk/evolgen/barton/index.html) for Mathematica (Wolfram Research 1999). Tests for overrepresentation of genotypes and linkage disequilibrium were performed using MultiLocus (Agapow and Burt 2001; available at http://www.bio.ic.ac.uk/evolve/software/multilocus/index.html). The correlation between genetic and geographical distance across all pairs of isolates was tested by randomization, in Mathematica.
Isolations: S. paradoxus was isolated from 28 of 344 bark scrapings, a success rate of 8%. There was no obvious difference in success rate between large and small trees or samples with different aspect. From 4 bark scrapings on each of two dates, 63 trees produced no isolates, 18 produced one isolate, and 5 produced two isolates. No S. cerevisiae strains were recovered although they were not excluded by our procedure.
Phenotypic variation: All 28 wild isolates were induced to undergo meiosis, and the four haploid spores were dissected from the asci. The resultant colonies were all capable of growth on minimal medium, demonstrating that none of the 28 strains carried an auxotrophic mutation. The frequency of auxotrophic mutants is thus 0, with a 2-unit upper support limit of 0.069. In S. cerevisiae, ∼60 genes can mutate to auxotrophy, as estimated by counting gene names denoting amino acid auxotrophy in the yeast genome (Goffeauet al. 1996). The spontaneous mutation rate in the lab is ∼10–8/locus/mitotic generation (Drake 1991; Zeyl and Devisser 2001). If the same values apply to S. paradoxus in nature, and the population is at mutation selection balance (i.e., the frequency of deleterious mutants is equal to q = u/s, where u is the mutation rate and s is the selection coefficient), the minimum harmonic mean selection coefficient against auxotrophic mutants necessary to keep them at the observed frequency is ∼60 × 10–8/0.069 ∼ 10–5. Thus, even very small selection coefficients would be sufficient to keep the mutants at the observed low frequency.
All colonies grown from haploid spores were also capable of forming tetrads on sporulation medium, indicating that they had autodiploidized following mating-type switching (i.e., were homothallic). In S. cerevisiae it appears that there is only one locus that can mutate to give a heterothallic phenotype (HO); making the same calculations as above indicates that the minimum selection coefficient against such mutants in the wild is ∼10–7.
Molecular data set 1: DNA sequences from nine isolates: The initial survey of molecular variation involved sequencing six loci from nine wild isolates plus the Type, Danish, FE1, and FE2 isolates. Sequence variation was discovered at each of the six loci, and there were a total of 24 polymorphic sites and one polymorphic repeat in ∼7000 bp of sequence from nine isolates (see Table 2). None of the isolates was heterozygous at any of these polymorphic sites. Three isolates (T8.1, T21.4, and T32.1) had identical genotypes; subsequent analysis (described below for data set 2) suggests that they are part of a single clone. No other pair of isolates had identical genotypes. Table 3 shows the average pairwise diversity per nucleotide site of these six genes in wild isolates. Only one amino acid polymorphism is seen among the nine wild isolates; the nonsynonymous nucleotide diversity at these loci is low (∼0.01%), comparable to that found in humans (Li and Sadler 1991). By contrast, the synonymous and noncoding nucleotide diversity is relatively high (∼0.3%), comparable to that found in Drosophila melanogaster (Begun and Aquadro 1992)— although this is still far lower than the diversity of ∼5% seen between sympatric isolates of Escherichia coli (Hall and Sharp 1992). These results indicate that the six genes are under purifying selection in S. paradoxus.
Gene trees for each locus, rooted using the Far Eastern isolates and S. cerevisiae, are shown in Figure 1. The data fit these trees perfectly—i.e., their consistency index is 1 (Farris 1989): There is no homoplasy within the European data. Far Eastern and European isolates, however, share a polymorphism in MFα1 pheromone repeat number. There are fixed differences between Far Eastern and European MFα1 sequences at other sites, so this homoplasy must have been created either by recombination between alleles from the Far East and Europe or by parallel mutations. Parallel mutation is a plausible cause, as repeat number is highly variable in Saccharomyces (Kitada and Hishinuma 1988) and varies from two to four repeats in our set of 28 wild isolates (see below). Overall, then, there is no compelling evidence of recombination within any of these genes.
To test for recombination between genes, the data from all six loci were combined for parsimony analysis. The European isolates give a shortest tree of 30 steps, 7 steps longer than the minimum possible (consistency index = 0.77), showing extensive homoplasy. Eight of the 15 possible pairs of gene trees conflict, and no branch is common to all 6 trees. Moreover, nucleotide sites in the same gene are significantly more likely to agree than sites in different genes (partition homogeneity test, P = 0.002). Recombination does therefore appear to have occurred between the six genes, each of which is on a different chromosome.
Interestingly, for none of the genes do our wild isolates form a monophyletic clade with respect to the Type and Danish strains (with the possible exception of SAG1). This indicates either gene flow on the scale of thousands of kilometers or large populations since divergence such that variation present at the time of divergence has not sorted out.
To compare the gene trees to the expectation under the null hypothesis of a neutral coalescent, we calculated the variance of branch lengths in the genealogies and compared them to those found on randomized genealogies with the same total number of mutations. For this analysis the sample size was taken as seven (i.e., clonemates were excluded). For STE3, seven of the eight differences segregating within our wild isolates are on the same branch and the variance of branch lengths is 4.1, significantly higher than that in random genealogies (P ∼ 0.005). For SAG1, all three segregating differences are on the same branch, and the variance is 0.75, also significant (P ∼ 0.05). This clumping of nucleotide changes on the genealogies could have resulted from nonindependent mutation (perhaps unlikely since the changes occurred >600 bp apart), introgression from other more divergent populations, or balancing selection at a linked locus.
Molecular data set 2: genotypes of 28 isolates at seven loci: The second data set consists of all 28 isolates genotyped for at least one polymorphism per locus sequenced, plus a microsatellite locus (Table 4). Six isolates, including the three found to be identical in data set 1, had identical genotypes. This is unlikely in a randomized data set (P < 0.001), and all 6 isolates were collected within 600 m of one another over a 3-month period (Figure 2). We interpret these 6 isolates as part of a clone. If five of these six clones are removed from the data set, there remain 5 pairs of identical isolates and only 18 different genotypes. This is fewer than would be expected in a randomized data set (P = 0.05), suggesting that one or more of these are also clonemates. One such pair (Q15.1 and Q16.1) was collected from the same tree at the same time and is the most likely candidate; each other pair is separated by >500 m and the data do not allow one to distinguish whether these are clonemates or are identical just by chance.
Apart from this localized clonal growth, there is no obvious correlation between genotype and geographic location. With all isolates included, there is a significant positive regression across all pairs of isolates of genotypic distance (proportion of loci at which the isolates differ) and geographical distance (slope = 0.01 km–1, P ∼ 0.02). However, if only a single (randomly chosen) isolate of each distinct genotype is included in the analysis, the regression is not significant (slope = 0.005 km–1, P ∼ 0.25). It appears that this population experiences frequent gene flow on a kilometer scale.
Homozygosity and inbreeding: In the entire data set, only a single isolate was heterozygous, at a single locus (Table 4). Wright's inbreeding coefficient, F, estimated from the fixation index (Brown 1979) is 0.99. This suggests a high level of inbreeding. In the appendix we model a mixed-mating population in which diploid individuals are derived either from intra-ascus mating or from random outcrossing. Using this model, the maximum-likelihood estimate of the outcrossing rate is 1.1%, with 2-unit support limits of 0.06 and 5%. If autodiploidization occurs in the wild, this method will under-estimate the true outcrossing rate, as autodiploidization removes heterozygosity far more quickly than intra-ascus mating does (appendix).
Recombination: In both data sets, there is abundant evidence of recombination between loci. Of the 21 possible pairs of loci, 18 of them are phylogenetically incompatible (i.e., show evidence of past recombination). Parsimony analysis of the entire data set gives a shortest tree of 22 steps, compared to a minimum possible of 12 (consistency index = 0.54). Taken as a whole there is significant multilocus linkage disequilibrium (IA = 0.21, r̄D = 0.035, P ∼ 0.02), but not if each distinct genotype is reduced to a single observation (IA =–0.05, rD =–0.008, P ∼ 0.6).
Like S. cerevisiae, S. paradoxus is capable of three types of reproduction in the laboratory: clonal replication, inbreeding, and outcrossing. All three appear to be important in molding the pattern of genetic variation in our natural population. Evidence for clonal replication comes from the repeated isolation of the same genotype, more than would be expected by chance: Among our 28 wild isolates, 6 appear to be members of a single clone, and at least one of the other five pairs of identical genotypes is also likely to be clonemates. There may have been inbreeding in the ancestry of these clonemates, or even mating between clonemates, but inbreeding alone without clonal replication would not lead to such an overrepresentation of genotypes. Evidence for inbreeding comes from the high homozygosity. An assumption in making this inference is that S. paradoxus in the field behaves as it does in the lab, and in particular that the diplophase predominates, and so the cells we isolated are diploid. In principle, an alternative explanation for the lack of heterozygosity is that cells are haploids in nature, but autodiploidize in the early stages of the isolation procedure. However, we do not consider it likely that S. paradoxus should change its life cycle so drastically in response to laboratory conditions. Finally, evidence for outcrossing comes from the single heterozygote we found plus the genealogical incompatibility between loci and absence of linkage disequilibrium.
This contrast between the great excess of homozygosity and the absence of linkage disequilibrium between genes reflects the fact that even small amounts of outcrossing and recombination will randomize alleles at different loci (Maynard Smith 1994). Nevertheless, inbreeding reduces the effective rate of recombination (re) in the population below the actual rate (ra), according to the relation re = (1 – F)ra (Dye and Williams 1997; Nordborg 2000). This is because recombination is effective only in heterozygous individuals, and inbreeding reduces the frequency of heterozygotes. In our population, F = 0.99, and so the effective recombination rate is 1% of what it would be in a randommating population. This means that linkage disequilibrium should extend for greater distances along the genome than would otherwise be the case and may have contributed to the absence of evidence for recombination within any of the genes studied. This extension of linkage disequilibrium along the genome means that DNA sequences will be more informative for at least some types of analyses than would otherwise be the case (Nordborg 2000), which makes S. paradoxus yet more attractive as a model system for population genetics and genomics. Also relevant, of course, is the actual rate of recombination, and it is interesting that S. cerevisiae has one of the highest known recombination rates per megabase of DNA. One explanation is that this has evolved to compensate for a low rate of outcrossing, as is suggested to explain the high chiasmata frequency seen in selfing plants (e.g., Zarchiet al. 1972). Alternatively, it is possible that the high rate of recombination has evolved as a consequence of intense selection pressures imposed by domestication (Burt and Bell 1987). It will be interesting to see whether S. paradoxus also has a high rate of recombination in lab crosses and to determine just how far linkage disequilibrium extends along the genome.
The low effective rate of recombination over distances of ∼1 kb allowed us to reconstruct genealogies for each gene. We compared the variance of branch lengths to those found on random genealogies and detected significant deviations from neutrality in two genes, both in the direction of changes being clumped on the genealogy. Nonindependent mutation, introgression, or balancing selection could give rise to such a pattern, although formal theoretical work would be useful in clarifying this. If balancing selection operates, it is probably not heterozygote advantage (given the low levels of heterozygosity), but frequency-dependent selection.
Inbreeding in S. paradoxus can occur both by intraascus mating and by autodiploidization (as well as by mating between other types of relatives) and it is not possible with our data to determine the relative frequency of these alternatives. One possible approach would be to compare heterozygosity at loci tightly linked to the mating-type locus to that at unlinked loci; if there has not been switching, heterozygosity near the matingtype locus will be maintained, even with selfing. Presumably switching does occur at least occasionally, as otherwise selection would not maintain the underlying mechanism.
Inbreeding species present some difficulties for interpreting sequence variability, due to genotypes being nonindependent. Although inbreeding predominates over outcrossing in S. paradoxus, it is not as extreme in this regard as some other yeasts, at least in the laboratory—in many species, mating typically occurs between a haploid mother cell and a daughter bud (Johannsen and van der Walt 1980; Kurtzman and Fell 1998). Other species probably outcross more than S. paradoxus—in particular, species that are vegetatively haploid and heterothallic (Kurtzman and Fell 1998). It would be interesting to compare patterns of genetic variation for such species with those found here.
Finally, the results reported here differ markedly from those reported for S. cerevisiae from wineries, in which there was a high frequency of heterozygous strains, recessive lethals, and heterothallism (Mortimer 2000). These differences are presumably the effect of domestication, although the precise details remain obscure.
With the development of wild strain collections, such as are available for Drosophila, and the identification of more molecular markers in this species, S. paradoxus may prove to be a valuable addition to the current suite of model organisms available to the population geneticist.
Thanks go to Alexandra Eggington and Celine Vass for technical help. This work was funded by the Natural Environment Research Council in studentships to Louise Johnson, Matthew Goddard, and Richard Hetherington; and a grant to Austin Burt.
To estimate the frequency of outcrossing compatible with the observed level of heterozygosity, we first modeled a mixed-mating population in which haploid cells either mate within the ascus with probability s or mate randomly in the population with probability t (=1 – s). Note first that in such a population, the probability that an individual chosen at random is derived from x generations of selfing (i.e., there are exactly x generations of selfing in its ancestry before one gets back to an outcrossing event) is sxt. Second, the probability that an individual derived from x generations of selfing is homozygous at locus i is 1 – HWi(2/3)x, where HWi is the Hardy-Weinberg proportion of heterozygotes in the population at that locus. Note that in this system selfing reduces heterozygosity by one-third every generation, not by one-half, as in more familiar systems where selfing gametes come from independent meioses (e.g., plants). This is because, with intra-ascus mating, each haploid spore produced from a heterozygous diploid shares an allele with only one of its three potential mating partners. Finally, the overall probability that a random individual is homozygous at the ith locus is the product of these two probabilities, summed over all possible numbers of generations of selfing in its ancestry: In our data set there are 7 loci, and the probability that an individual will be homozygous at all of them is then Note that this assumes the loci are independent. For the six isolates with missing data, the inside product is done over only the loci for which there are data. Finally, isolate T18.2 is homozygous at 5 loci and heterozygous at SAG1, and the probability of an individual being this is When we count only one isolate of each distinct genotype, the data consist of 14 completely homozygous genotypes, two homozygous isolates with unknown STE2 genotype, one homozygous isolate with unknown MFα1 genotype, and the heterozygote T18.2. The probability of observing the entire data set is therefore The maximum possible value of this occurs at an outcrossing rate of t = 1.1%, with 2-unit support limits of 0.06 and 5%.
We also modeled a mixed-mating population in which individuals were derived either from mating between clonemates (autodiploidization) with probability s or from random outcrossing with probability t. In this case individuals are either completely homozygous at all loci or heterozygous at Hardy-Weinberg proportions, and the probability an individual is homozygous at the ith locus is With this model the maximum-likelihood outcrossing rate is 6%, with 2-unit support limits of 0.3 and 23%, higher than that in the previous model, as a greater frequency of outcrossing is needed to counterbalance the more intense inbreeding caused by autodiploidization.
Communicating editor: D. Charlesworth
Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos. AJ515177–AJ515216, AJ515322–AJ515352, and AJ515430–AJ515449.
- Received November 4, 2002.
- Accepted September 22, 2003.
- Copyright © 2004 by the Genetics Society of America