Creating Saccharomyces yeasts capable of efficient fermentation of pentoses such as xylose remains a key challenge in the production of ethanol from lignocellulosic biomass. Metabolic engineering of industrial Saccharomyces cerevisiae strains has yielded xylose-fermenting strains, but these strains have not yet achieved industrial viability due largely to xylose fermentation being prohibitively slower than that of glucose. Recently, it has been shown that naturally occurring xylose-utilizing Saccharomyces species exist. Uncovering the genetic architecture of such strains will shed further light on xylose metabolism, suggesting additional engineering approaches or possibly even enabling the development of xylose-fermenting yeasts that are not genetically modified. We previously identified a hybrid yeast strain, the genome of which is largely Saccharomyces uvarum, which has the ability to grow on xylose as the sole carbon source. To circumvent the sterility of this hybrid strain, we developed a novel method to genetically characterize its xylose-utilization phenotype, using a tetraploid intermediate, followed by bulk segregant analysis in conjunction with high-throughput sequencing. We found that this strain’s growth in xylose is governed by at least two genetic loci, within which we identified the responsible genes: one locus contains a known xylose-pathway gene, a novel homolog of the aldo-keto reductase gene GRE3, while a second locus contains a homolog of APJ1, which encodes a putative chaperone not previously connected to xylose metabolism. Our work demonstrates that the power of sequencing combined with bulk segregant analysis can also be applied to a nongenetically tractable hybrid strain that contains a complex, polygenic trait, and identifies new avenues for metabolic engineering as well as for construction of nongenetically modified xylose-fermenting strains.
LIGNOCELLULOSIC biomass, an untapped feedstock for biofuel production, is rich in five-carbon sugars such as xylose and arabinose; the metabolism of these sugars to ethanol or other economically important molecules is thus crucial for the cost-effective use of such biomasses (Buckeridge et al. 2011; Chandel et al. 2011). However, a fundamental problem in moving toward industrial-level production of cellulosic ethanol is that currently used strains of the predominant microorganism utilized in industrial fermentations—the budding yeast Saccharomyces cerevisiae—do not use xylose as a fermentable substrate (Chiang and Knight 1960). Significant progress has been made over the past 30 years to address this issue, and through the use of metabolic engineering and directed evolution (Ho et al. 1998; Sonderegger and Sauer 2003; Kuyper et al. 2004; Matsushika et al. 2009; Kim et al. 2010; Ha et al. 2011) strains of S. cerevisiae that have the capability to ferment xylose to ethanol now exist. Despite this progress, problems remain to be solved before these strains come into widespread industrial use, including the fact that most current xylose-fermenting strains are genetically modified—a notion that continues to remain unpopular in many countries (Byrne 2006).
Traditionally it has been thought that S. cerevisiae does not metabolize xylose, despite the fact that its genome contains genes putatively encoding the requisite enzymes for the two-step redox conversion of xylose to the fermentable-intermediate xylulose (Chiang and Knight 1960; Toivari et al. 2004). To date, only two studies have demonstrated the existence of natural Saccharomyces isolates capable of xylose metabolism (Attfield and Bell 2006; Wenger et al. 2010), of which only the latter included genetic characterization of the trait. In Wenger et al. (2010) we described a gene, XDH1, that is present in many wine yeast strains but not found in the reference yeast S288c genome, which encodes a putative xylitol dehydrogenase sufficient to allow otherwise wild-type S. cerevisiae laboratory strains to grow slowly in xylose. However, there is no evidence that these strains grow anaerobically in xylose or that they produce any ethanol, and the observed growth is modest at best. Due to this poor xylose utilization, work on creating industrially viable, xylose-metabolizing Saccharomyces yeasts has largely focused on metabolic engineering, often combined with directed evolution.
Metabolic engineering of xylose fermentation in Saccharomyces yeasts takes advantage of the fact that other fungi and bacteria, while often not industrially suitable for large-scale ethanol fermentations, are nevertheless capable of xylose metabolism via one of two pathways. Fungi such as Scheffersomyces stipitis (formerly Pichia stipitis), Pachysolen tannophilus, and Candida shehatae metabolize xylose to its keto-isomer xylulose via a two-step reduction oxidation pathway involving xylose reductase (XR) and xylitol dehydrogenase (XDH) (Jeffries 2006). In most bacteria and some fungi, however, xylose is directly isomerized to xylulose by xylose isomerase (XI) (Jeffries 1983). In both cases, xylulose is subsequently phosphorylated to xylulose-5-phosphate and metabolized via the nonoxidative pentose phosphate pathway (PPP) (Wang et al. 1980). Introduction into S. cerevisiae of the genes from other organisms encoding the two oxidoreductases or the isomerase has produced strains that can utilize xylose, but these approaches have been plagued by various problems (Chandel et al. 2011). These include issues such as poor expression of genes encoding XR, XDH, and xylulokinase (XK) activities, redox imbalances due to different cofactor specificities of XR/XDH enzymes, glucose catabolite repression, low affinity of the hexose transporters for xylose, and low flux through the PPP. Others have attempted to address these problems with metabolic engineering and directed evolution of engineered strains; see (Buckeridge et al. 2011) for a recent review. On the basis of these strategies, a strain has been recently developed that shows rapid cofermentation of cellobiose and xylose (Ha et al. 2011) and other xylose-fermenting strains continue to show improvement. Despite these advances, to our knowledge no Saccharomyces strains are currently utilized for xylose fermentation in large-scale industrial settings.
In light of the remaining challenges in the xylose metabolic-engineering field, we believe much can still be learned from studying natural Saccharomyces yeasts that are capable of xylose utilization. Characterization of the genetics and physiology of these natural xylose-utilizing yeasts will provide testable hypotheses for use in further modification and improvement of existing engineered strains. Toward this goal, we have characterized the genetic basis of a polygenic, xylose-metabolism phenotype in a Saccharomyces sensu stricto hybrid yeast that we previously identified as capable of xylose utilization (Wenger et al. 2010). In pursuit of the loci that contribute to this strain’s growth in xylose, we have developed a novel method for generating progeny from this otherwise genetically intractable hybrid strain and utilized high-throughput sequencing in conjunction with bulk segregant analysis for the identification of quantitative trait loci. Among these loci we have identified a new homolog of a known xylose pathway gene, GRE3, as well as a homolog of S. cerevisiae APJ1, which encodes a putative chaperone, which was previously unconnected to xylose metabolism.
Materials and Methods
Yeast strains and techniques
All S. uvarum and hybrid yeast strains used in this study are shown in Table 1. GSY1063 was derived from CBS7001 by introducing ho::kanMX (see primers in Supporting information, Table S1). GSY2712 is a Leu+ derivative of JRY8145, while GSY2719 was derived from a cross between JRY8153 and GSY1063. apj1Δ::URA3 (GSY4341) and gre3Δ::URA3 (GSY4324) strains were generated in GSY2719 by transformation with a fusion PCR product (see Table S1 for details). Yeast transformation was performed by the lithium acetate method (Schiestl and Gietz 1989). Preparation of yeast genomic DNA was performed as described previously (Treco 1987). All strains were grown at 25°.
For long-term growth curves, strains were first grown to saturation in yeast extract/peptone/2% glucose (YPD) medium, after which, to start the growth curve, they were diluted 100-fold into a total of 5 ml of yeast extract/peptone medium containing either 2% xylose or no carbon source. All strains were grown at 25° with aeration in a roller drum. Optical density (OD600) was measured at 600 nm in a Biomate 3 spectrophotometer. Cell concentration was measured in Z2 Beckman Coulter Counter. For bulk segregant analysis, progeny were pooled as described in the Results section.
Molecular cloning techniques
Standard molecular biology techniques were used for all plasmid construction and cloning. High-fidelity Phusion DNA polymerase (Finnzymes) was used for DNA amplifications according to the manufacturer’s recommendations. Plasmids are listed in Table 2 and details of their construction are available upon request. Briefly, pGS35 and pGS36 were constructed from YCplac22 (Gietz and Sugino 1988) by replacing the TRP1 gene with the kanMX or hphMX cassette, respectively. pGS37 and pGS38 contain the GAL1/GAL10 promoter and ACT1 transcriptional terminator from pTS210 (Marschall et al. 1996). The HO endonuclease open reading frame was amplified from the S. cerevisiae Simi White wine yeast strain (GSY788) using primers GSP1 and GSP545 (Table S1) and cloned into the XbaI sites of pGS37 and pGS38 to make pGS39 and pGS40, respectively. To make pGS131 and pGS132 we amplified the APJ1 gene from CBS7001 and from GSY4318 using primers GSP546 and GSP547 and then cloned it into the BamHI site of pGS35. Similarly, to make pGS149 and pGS159, the GRE3 gene was amplified from the same strains using primers GSP535, 536, 538, and 539 and cloned into the BamHI site of pGS35. To create a promoter swap construct in pGS156, the promoter region of GRE3 was amplified from CBS7001, fused by PCR to the GRE3 coding and the 3′ region from GSY4318 using a primer that contained overlapping sequence from the 3′ end of the promoters and the 5′ end of the open reading frame (GSP533), and cloned into pGS35. pGS170 and pGS171 were constructed by swapping the NaeI fragment containing the last 435 amino acids from APJ1 coding sequence between plasmids pGS131 and pGS132.
Array comparative genomic hybridization
Genomic DNA from CBS1502 was prepared using Zymo Research YeaStar columns according to the manufacturer’s recommendations, and then digested with HaeIII. We then labeled 350 ng of this DNA with Cy5 (red); we similarly labeled reference DNA (an equimolar mix of S. uvarum (CBS7001 strain) and S. cerevisiae (S288c strain) sheared genomic DNA) with Cy3 (green). The two labeled DNAs were then mixed together and hybridized to microarrays containing probes densely covering both the S. uvarum (CBS7001 strain) and S. cerevisiae (S288c strain) genomes; the microarrays and the hybridization methods used were exactly as described in Dunn and Sherlock (2008).
Preparation of genomic DNA for High Throughput Sequencing
Segregants for bulk segregant analysis were frozen in a sorbitol solution (0.9 M sorbitol, 0.1 M EDTA, and 0.1 M Tris-HC1, pH 8.0), and then combined for DNA isolation, as described (Treco 1987). DNA was prepared for sequencing on the Illumina platform as follows. Paired-end Illumina adapters were preannealed in a 50-µl reaction containing 1× T4 DNA ligase buffer (NEB no. B0202S) and each adapter at a concentration of 40 µM by incubating at 94° for 5 min, and then 70°, 60°, 50°, 40°, 30°, and 25°, each for 1 min. Five micrograms of genomic DNA was sheared by sonication to approximately 500 bp in a COVARIS sonicator. Thirty microliters of sheared DNA was subjected to end repair in a 50-µl reaction (1× T4 DNA ligase buffer, 0.8 µM dNTPs (NEB no. N0447S), 2.5 µl of T4 DNA polymerase (NEB no. M0203L), 0.5 µl Klenow (large fragment) (NEB #M0210L), and 2.5 µl of T4 PNK (NEB no. M0201L) by incubation at 20° for 30 min. End-repaired DNA was purified using a QIAquick PCR purification column, eluting in 33 µl of buffer EB. Addition of a dATP to end-repaired DNA was performed by incubation at 37° for 30 min (32 µl of end-repaired DNA, 5 µl of buffer 2 (NEB no. B7002S), 1 µl 10 mM dATP (Invitrogen no. 18252-015), 3 µl Klenow Exo-Fragment (NEB no. M0212L)). After addition of dATP, reactions were purified using a QIAgen MinElute column, eluting in 11 µl of buffer EB. Illumina adapter ligation was performed in a 20-µl reaction by incubation at 20° for 15 min followed by 65° for 10 min (10 µl of DNA from previous step, 1× T4 DNA ligase buffer, 1 µl T4 DNA ligase (NEB no. M0202S), 1 µl 40 µM adapter mix from preannealing). Following adapter ligation, size selection was performed on the Invitrogen E-gel system, targeting 600 bp fragments. Following size selection, the library was amplified using PCR in a 20-µl reaction (1.25 µM primers PE1 and PE2, 2-µl size-selected DNA, 0.25 µM dNTPs, 1× HF buffer, and 0.5 µl Phusion DNA polymerase (NEB no. F-530L). DNA was amplified using the following program: 98° for 30 sec; 12 cycles of 98° for 10 sec, 65° for 30 sec, and 72° for 30 sec and a final 72° extension time of 5 min. The amplified library was purified using a QIAquick PCR purification column, eluting in a final volume of 30 µl buffer EB. The final library concentration and size estimates were determined using Qubit (Invitrogen) and Bioanalyzer (Agilent). Flow cells for the Illumina GAII platform were prepared according to manufacturer’s instructions and sequencing was performed for 36 cycles.
Analysis of high-throughput sequencing data
Sequence reads with their qualities (FASTQ) were mapped to the S. uvarum genome (available at http://saccharomycessensustricto.org) (Scannell et al. 2011) using Stampy v. 1.0.13 (Lunter and Goodson 2011) in conjunction with BWA v. 0.5.9-r16 (Li and Durbin 2009), with default parameters. Whole-genome pileup files were created using the Samtools v. 0.1.16 “pileup” command (Li et al. 2009) with option –c. Custom perl scripts were written to calculate allele frequency differences between positive and negative pools and to determine positions with significantly different frequencies. For SNP calling, we required a position to be covered by at least 20 sequencing reads. The majority SNP call from the Samtools “pileup” was used to calculate an allele frequency in the positive and negative pools, and we calculated a T statistic, on the basis of Craig et al. (2009) aswhere the binomial Variance isP-values were then calculated assuming the T statistic follows a χ2 distribution (Craig et al. 2009). P-value cutoffs were determined using a Bonferroni correction of the alpha significance value, 0.01 divided by the number of SNPs tested. False discovery rates (FDR) were estimated empirically by permuting the pool labels of the SNP calls at each position, recalculating allele frequencies, and generating P-values, as described, from the permuted data. Each pool was permuted 500 times, and the FDR was determined by (median number of false positives after 500 permutations) divided by (“true” positives from unpermuted data). Data were plotted using R. Sequence data are available in the Short Read Archive with accession no. SRA045682. Perl scripts are available upon request.
Strains were pregrown in YPD overnight and diluted 100 fold into 20 ml YP medium containing 2% xylose. After 3 days of growth, cells were harvested by filtering and frozen in liquid nitrogen until RNA purification. Hot phenol RNA preparation was performed as described previously (Lee et al. 2008) and followed by treatment with Ambion TURBO-DNAfree treatment using manufacturer’s recommendations (Life Technologies). Two micrograms of total RNA were reverse transcribed using oligo(dT) primer and Superscript II according to the manufacturer’s instructions (Invitrogen). Real-time PCR was performed on a Bio-Rad CFX96 cycler using SsoFast EvaGreen Supermix (Bio-Rad). S. uvarum YDR458C and YJL088W were used as reference genes, with primer pairs for those genes as described in Bullard et al. (2010). The primer pair for the GRE3 gene was designed to recognize both GRE3CBS7001 and GRE3CBS1502. Primers used for qPCR (GSP556-561) are listed in Table S1. To calculate the relative quantification value we used average ΔΔCt values, normalizing relative expression to the Δ Ct in the APJ1CBS7001 GRE3CBS7001 strain (Relative quantification value , with upper limit , and lower limit ).
A Saccharomyces sensu stricto hybrid that grows in xylose
To identify naturally occurring yeasts that have the ability to grow in xylose as a carbon source, we previously reported a screen of 647 Saccharomyces yeasts where strains that could reproducibly grow in minimal and/or rich media supplemented with 2% xylose were designated “xylose positive” (Wenger et al. 2010). We identified 38 xylose-positive yeasts in this screen, 29 of which were S. cerevisiae wine strains whose modest growth in xylose was controlled by a single locus, XDH1. Of the 9 other xylose-positive yeasts that we identified, the strain with the most robust xylose phenotype was CBS1502, which showed a reproducible increase in both optical density and cell number relative to a xylose-negative, S. uvarum control strain (Figure 1). CBS1502 is also known as Yorkshire Haze 1, and its CBS-KNAW Fungal Biodiversity Centre record reports its provenance as either an S. bayanus or S. pastorianus yeast isolated from cloudy beer. Because of the uncertainty in classification of this brewing contaminant, we first determined the genomic makeup of this strain by array comparative genomic hybridization (aCGH) using custom DNA microarrays that contain specific probes that distinguish the S. cerevisiae and S. uvarum genomes (Dunn and Sherlock 2008). These data show that the vast majority of this strain’s genome is derived from S. uvarum, but also contains regions derived from S. cerevisiae and the recently discovered S. eubayanus (Libkind et al. 2011) (Figure S1, blue circled regions).
To determine if this strain was genetically tractable, we characterized its sporulation efficiency and determined that it was too low for standard genetic analysis (<1%, as would be expected for a hybrid) and, therefore, developed a novel approach for characterization of the xylose phenotype.
Strategy for analyzing a genetically intractable strain
Because CBS1502 has both poor sporulation and spore viability we developed a novel strategy using a tetraploid intermediate to segregate and identify genetic factors that contribute to its growth in xylose (see Figure 2 and Figure S2). First, CBS1502 and CBS7001 (the sequenced reference S. uvarum strain) were transformed with plasmids that express the site-specific endonuclease encoding-gene HO under control of the galactose inducible GAL1 promoter and contain one of two different selectable markers (KanMX and HphMX). These strains were then individually grown to saturation in rich medium supplemented with 2% raffinose and then shifted to galactose-containing medium to induce expression of HO. HO-induced strains were then combined and plated onto solid YEPD-based media containing both Geneticin (Invitrogen, 200 mg/liter) and Hygromycin B (Cellgro, 150 mg/liter). Transient expression of HO—which is normally repressed in a/α diploids—allows recombination at the mating-type locus, and at a low frequency will allow the formation of diploids with mating type a/a or α/α. These diploids are mating competent and are able to form CBS1502/CBS7001 a/a/α/α tetraploids. One tetraploid (GSY2607) was then put through one round of meiosis and tetrad dissection to generate heterozygous diploids that could be a/a, α/α, or a/α mating type. Because both CBS7001 and CBS1502 contain a wild-type HO, any a/a or α/α spores will switch mating type following cell division and then self-fertilize to produce tetraploids, while the a/α spores would remain as stable, nonmating diploids. We selected several of these a/α diploid spores and then sporulated them again to produce haploid spores; these spores were then able to switch mating type and self-fertilize to become homozygous a/α diploids, our desired end products for further genetic characterization.
These “double-reduced” strains were assessed for growth on xylose, and we selected one of the resulting strains as our xylose-positive lineage of interest. This strain (GSY2612) was then backcrossed two more times to CBS7001, using the same tetraploid-intermediate method, to further increase spore viability (Figure S2). At each backcross, the best xylose-positive spore product was chosen to proceed into the next round of crossing. After these backcrosses, some of the xylose-positive progeny were determined to be stable haploid strains—presumably containing a mutation in the HO gene or another gene involved in mating-type switching—and were crossed one additional time to a haploid (hoΔ::KanMX) CBS7001 derivative (GSY1063). The haploid segregants from this diploid strain (GSY2694) were screened for growth in xylose and used for bulk segregant analysis. See Table 1 and Figure S2 for strain names and crossing details.
From this final backcross, the segregation pattern of growth in xylose was roughly three xylose-negative spores to one xylose-positive spore. This pattern is consistent with a hypothesis that two unlinked genes contribute to growth in xylose, both of which are required for the most robust xylose-positive phenotype. This observation is also consistent with the diversity of xylose phenotypes seen in CBS1502 spores.
Bulk segregant analysis by sequencing reveals a polygenic xylose phenotype
Bulk segregant analysis (BSA), originally developed using microarrays but more recently adapted for high-throughput sequencing, has been proven to quickly and specifically identify candidate loci on the basis of a strategy that pools progeny of a cross between two polymorphic strains based on a phenotype of interest (Quarrie et al. 1999; Brauer et al. 2006). To determine the loci contributing to growth in xylose in the derivative of CBS1502 described above, we created one pool containing 21 xylose-positive segregants (from tetrad dissection of GSY2694) and one pool containing 21 xylose-negative segregants from the same cross, where xylose positivity was defined as an increase in both cell number (as measured by Coulter counter) and an increase in optical density relative to a negative control, S. uvarum CBS7001. DNA was isolated from each pool and genomic DNA libraries were prepared for sequencing on the Illumina GAIIx platform (see Materials and Methods).
We mapped the resulting sequence reads to the S. uvarum (CBS7001) genome (Scannell et al. 2011); we then called SNPs and quantified their allele frequencies at polymorphic sites across the genome and determined if each site had a significantly different frequency between the positive and negative pools (see Materials and Methods for further details). The results of this analysis are shown in Figure 3, with false discovery rates estimated to be <0.2%. After performing BSA on GSY2694 progeny, we observed three genomic intervals in which the CBS1502 alleles significantly segregate with the xylose-positive phenotype: one on chromosome VII that is approximately 13 kb (region A), one on chromosome XIV that is approximately 10 kb (region C), and one on chromosome XV that is approximately 76 kb (region D). We also observed a 65-kb region on chromosome XI (region B), in which the CBS7001 alleles segregate with the xylose-positive phenotype, suggesting that there is a genomic region in CBS1502 that is detrimental to growth on xylose.
On the basis of the segregation pattern of growth in xylose in GSY2694 we had expected to find two genes that were both required for the phenotype. The BSA data support this hypothesis because of the pattern of allele frequencies that we observed in the four genomic intervals. In both regions C and D, we observed that the positive pool contained nearly 100% CBS1502 alleles, while the negative pool contained only 50% or less of CBS1502 alleles. This is consistent with the causative genes in these regions both being required for the phenotype to be observed. In region A, however, we observed that the positive pool contained approximately 60–70% CBS1502 alleles, while the negative pool contained approximately 20–30% CBS1502 alleles. Region B represented a third category, where the positive pool contained <10% CBS1502 alleles, while the negative pool contained ∼50% CBS1502 alleles. These data suggest that regions C and D are the two main causative alleles for the phenotype, while regions A and B enhance or modify the phenotype but are not necessary for it to be observed. The CBS1502 alleles in peak B presumably negatively affect growth in xylose and were thus selected against in our xylose positives.
To genetically confirm that each of these four genomic intervals segregates as predicted on the basis of the BSA data, we chose SNPs that created a restriction fragment length polymorphism within each region and tested each of the 21 xylose-positive and 21 xylose-negative GSY2694 segregants for which polymorphism they contained. The data (not shown) confirmed that these four regions segregate nonrandomly in the positive and negative pools as predicted by the sequence data. A χ2 goodness-of-fit test significantly rejected a null hypothesis of random segregation between the pools for all four peaks (P < 0.01 for peak A, P < 0.001 for peaks B through D).
Regions CCBS1502 and DCBS1502 are both required for growth in xylose
To confirm that regions C and D—the two hits from the bulk segregant analysis that we predicted were both required for growth in xylose—were the responsible intervals, we selected a single segregant of GSY2694 that contained regions C and D from CBS1502 and region B from CBS7001 and crossed it to a haploid derivative of CBS7001. Note that both copies of region B are derived from CBS7001, while regions C and D, which are unlinked, are heterozygous and thus segregating. We then tested haploid strains containing all four pairwise combinations of regions C and D for their ability to grow in xylose. We observed that the presence of region CCBS1502 results in increased optical density in xylose relative to CBS7001 (Figure 4A). The presence of region DCBS1502 does not result in a significant phenotype on its own; however, the presence of regions CCBS1502 and DCBS1502 together results in increased optical density in xylose that is greater than the sum of the individual regions CCBS1502 and DCBS1502 xylose phenotypes (Figure 4A). Interestingly, this synergistic interaction, indicative of positive epistasis between the two genes, is more noticeable when we measured growth in xylose as an increase in cell number, as only strains that contain both region C and D from CBS1502 show significant increases in cells per milliliter at the end of the time course (Figure 4B). These data show that the genes within these two intervals interact via positive epistasis to contribute to an increase in cell number and cell size in xylose in CBS1502, confirming the hypothesis that both are required for the most robust xylose-positive phenotype.
We also tested whether loci within peaks A and B affected the xylose-positive phenotype in segregants containing both the CCBS1502 and DCBS1502 regions. We redissected GSY2694 and by PCR identified and selected 19 spores containing CBS1502 alleles for both regions C and D and then genotyped them for regions A and B. We then measured OD600 and cell density at the end of a xylose growth experiment. Comparing growth between these spores on the basis of their genotypes for regions A and B revealed a subtle but statistically insignificant (P > 0.5) difference between the different A and B genotypes (data not shown). We did not pursue regions A and B further because of their lack of a significant phenotype.
GRE3 and APJ1 are loci responsible for growth in xylose
Having confirmed that regions C and D positively and synergistically contribute to growth in xylose, we wanted to determine the specific genes that are causal for this phenotype. The sequence of region C was found to contain two genes with nonsynonymous changes: APJ1 (YNL077W) and NIS1 (YNL078W). To determine which of these two genes might be responsible, we approached the problem with the assumption that the responsible allele may be recessive (having determined that region C is homozygous is CBS1502, at least consistent with this notion; data not shown). We transformed GSY4340, which contains regions CCBS1502 and DCBS1502, with plasmids containing either APJ1CBS7001 or NIS1CBS7001 and screened the resulting transformants for growth in xylose. We observed that transformation with the APJ1CBS7001 plasmid reduced growth in xylose, whereas the NIS1CBS7001 plasmid had no effect on the xylose phenotype, suggesting that the CBS1502 allele in this region is a recessive allele of APJ1 (data not shown). The protein sequence of Apj1CBS1502 is shown in Figure S3.
To determine if this APJ1 homolog is a loss-of-function allele in addition to being recessive, we constructed a haploid derivative of GSY2719 with APJ1 deleted (GSY4341) and then genetically introduced region DCBS1502 into this deletion background via mating and dissection, producing GSY4322 (apj1Δ::URA3 DCBS1502). GSY4322 was transformed with empty vector, plasmids expressing either APJ1CBS1502 (pGS132), or APJ1CBS7001 (pGS131). Interestingly, GSY4322 transformed with the empty vector had a phenotype similar—albeit not identical—to that of the strains containing both APJ1CBS1502 and DCBS1502, indicating that APJ1CBS1502 might be a loss-of-function allele (Figure 5A). More specifically, because the phenotypes are not identical, APJ1CBS1502 may be a hypomorphic allele (partial loss-of-function). Adding credence to this supposition, when GSY4322 was transformed with a plasmid containing APJ1CBS1502, the xylose phenotype was comparable to both the same strain transformed with the empty vector and to the parental strain. Conversely, transformation with APJ1CBS7001 inhibited growth in xylose (Figure 5A). Since the coding region of APJ1CBS1502 contained only two changes from APJ1CBS7001 (a shorter polyglutamine repeat and a G234D substitution; Figure S3), we separated these two changes and generated plasmids containing either APJ1shorter(18) polyQ (pGS171) or APJ1G234D (pGS170) and tested their effect in GSY4322. While the phenotype conferred by APJ1shorter(18) polyQ was indistinguishable from that conferred by APJ1CBS7001, the APJ1G234D allele still resulted in xylose-positive growth (Figure 5B). Taken together, these data show that APJ1CBS1502 acts as a recessive, perhaps partial loss-of-function allele to allow growth on xylose and that the G234D substitution is responsible for this phenotype.
Region D is 100 kb long and contains 37 genes (32 with nonsynonymous changes), including an obvious candidate in the S. uvarum homolog of GRE3 (YHR104W), which in S. cerevisiae is a known aldo-keto (xylose) reductase (Traff et al. 2001). To determine whether this was the specific gene within this interval responsible for increased growth in xylose, we cloned both of the alleles from CBS1502, which is heterozygous at the GRE3 locus. We Sanger sequenced both GRE3 alleles from the resulting plasmids. One GRE3 allele is identical to the GRE3 gene found in CBS7001, while the other (hereafter referred to as GRE3CBS1502) is identical to that found in the recently discovered S. eubayanus (Libkind et al. 2011). The protein sequence of Gre3CBS1502, as well as a phylogenetic tree of closely related aldo-keto reductases, is shown in Figure S4. We constructed a haploid derivative of GSY2719 with GRE3CBS7001 deleted (GSY4324) and then genetically introduced region CCBS1502 into this deletion background via mating and dissection, producing GSY4326 and GSY4327. Transformation of GSY4326 and GSY4327 (gre3Δ::URA3 APJ1CBS1502) with a plasmid containing the GRE3CBS7001 allele partially rescued growth in xylose, whereas transformation with a plasmid containing GRE3CBS1502 increased growth in xylose by almost twofold relative to the CBS7001 allele, indicating that the GRE3CBS1502 allele contributes to the CBS1502 xylose growth phenotype (Figure 6).
Because there are changes in the promoter region of GRE3CBS1502 compared to GRE3CBS7001 in addition to the ∼20 amino acid changes between the two putative protein sequences, we replaced the promoter sequence (∼1.5 kb) from the hybrid strain with the promoter sequence from S. uvarum to determine whether changes in this region affect growth in xylose. The xylose phenotype in GSY4326 or GSY4327 transformed with a plasmid containing this construct is the same as GRE3CBS1502 with its own promoter (Figure 6), indicating that changes in the promoter sequence are not responsible for the increased growth in xylose. More work is needed to determine what specific change(s) in GRE3CBS1502 coding sequence or in the 3′ region result in enhanced growth in xylose in the hybrid CBS1502.
To further investigate the requirement for GRE3 and to determine whether it encodes the sole/major xylose reductase, we analyzed the phenotype of GRE3 deletion in the presence of either APJ1 allele. We showed that GRE3 deletion eliminates growth in xylose in backgrounds containing either the APJ1CBS1502 or APJ1CBS7001 alleles (data not shown). This is in contrast to GRE3CBS7001 or GRE3CBS1502, when paired with APJ1CBS7001, allowing moderate growth in xylose (Figure 4A). These data show that GRE3 is the main xylose reductase in CBS1502 and that the improvement in the xylose phenotype provided by APJ1 requires the presence of a GRE3 allele, be it the allele from CBS1502 or CBS7001. Taken together, these data show that GRE3CBS1502 encodes the major functional xylose reductase in CBS1502 and the APJ1CBS1502 xylose-positive phenotype requires its presence. In summary, our data show that APJ1CBS1502 (loss-of-function) and GRE3CBS1502 (gain-of-function) interact epistatically to contribute to the robust xylose-positive phenotype of CBS1502 and are the causative genes in the regions C and D genomic intervals from BSA of GSY2694.
GRE3 expression is higher in APJ1CBS1502 strains
Because transcription of S. cerevisiae GRE3 is known to be responsive to stress (Garay-Arroyo and Covarrubias 1999) and because APJ1 (at least in S. cerevisiae) is a heat-shock protein, we decided to test whether transcript abundances of GRE3CBS7001 or GRE3CBS1502 are altered in strains carrying the APJ1CBS1502 allele. Four tetratype tetrads from GSY4319 representing four biological replicates for each genotypic combination were grown in xylose-containing medium for 72 hr. We prepared total RNA and performed quantitative RT–PCR as described in the Materials and Methods. Ct values for S. uvarum YDR458C and YJL088W were used as controls because we did not observe significant variation of their levels of expression between different genotypes (Figure 7A). As shown in Figure 7B, relative mRNA levels of both GRE3 alleles were increased in the presence of APJ1CBS1502. The increase in GRE3 transcript abundance in the APJ1CBS1502 GRE3CBS1502 strain compared to wild type is significant (P-value < 0.01, as determined by a t-test, Bonferroni correcting for 6 tests, each genotype using both controls). Neither of the other two genotypes showed a significant difference in GRE3 transcript abundance compared to wild type.
In this study we characterized the genomic architecture of a polygenic xylose phenotype in a Saccharomyces hybrid yeast strain. Applying high-throughput sequencing to BSA of this phenotype revealed at least four loci that contribute to the phenotype; two are homologs of S. cerevisiae GRE3 and APJ1, while the remaining two loci have yet to be identified.
Array CGH and sequencing revealed that this strain is a complex interspecific hybrid between S. uvarum, S. cerevisiae, and the recently described and sequenced species S. eubayanus (Libkind et al. 2011); its hybrid nature was further supported by its low levels of sporulation and spore viability, as is typical of hybrids (Greig et al. 2002). It is also possible that there are recessive lethal alleles that also contribute to the observed poor spore viability. Because of the complex nature of this strain, straightforward genetic techniques were not feasible, and we therefore developed a novel approach to performing genetic analyses in this hybrid, utilizing a tetraploid intermediate. Our method of generating tetraploids by transient expression of HO can be applied to any strain that cannot normally be sporulated for various reasons, and simply requires that the strain be amenable to DNA transformation, and that it is capable of mating to a closely related but polymorphic strain. Notably, this method may have applications for commercial yeasts, or yeasts isolated from industrial environments, which themselves are often hybrids or have poor or no sporulation (Tsuboi and Takahashi 1988).
We identified four loci in the CBS1502 hybrid that contribute to xylose utilization (including one that negatively affects growth in CBS1502) and identified two of the genes that contribute to the xylose-positive phenotype: homologs of the S. cerevisiae genes GRE3 and APJ1. In S. cerevisiae, GRE3 encodes a nonspecific aldo-keto reductase that has NADPH-dependent activity on xylose as a substrate (Toivari et al. 2004). Our previous work has shown that endogenous S. cerevisiae GRE3 contributes to xylose utilization in S. cerevisiae carrying the XDH1 gene (Wenger et al. 2010). However, Gre3p in S. uvarum appears to be the major functional xylose reductase, unlike in S. cerevisiae.
S. cerevisiae Apj1p is a putative member of the Hsp40/DnaJ family of chaperone proteins. These proteins are involved in regulation of the heat-shock protein Hsp70 (Cyr et al. 1994) via direct interaction with Hsp70 through their conserved J domains. While we do not know the specific role of Apj1p during growth in xylose, we speculate that it might act as a negative regulator of GRE3 expression. We have demonstrated that a hypomorphic APJ1CBS1502 allele results in higher GRE3 transcript abundance compared to the presumably fully functional APJ1CBS7001 allele. Interestingly, the effect on GRE3 transcript abundance is more pronounced for the CBS1502 allele than the CBS7001 allele, likely responsible for the epistatic interaction we observed. Because we have ruled out the promoter as being responsible for the allelic difference between GRE3CBS1502 and GRE3CBS7001 with respect to the xylose-positive phenotype, the allelic specificity may be due to APJ1CBS1502-dependent increased stability of GRE3CBS1502 mRNA rather than direct transcriptional regulation. Indeed, it has been demonstrated not only that GRE3 is induced under stress, but that GRE3’s transcript stability is also increased under stress (Castells-Roca et al. 2011). Perhaps our APJ1 hypomorphic allele somehow mimics a stress condition, either directly or indirectly affecting GRE3. Further work is required to determine the exact mechanism of increased transcript abundance of GRE3 in the presence of the APJ1CBS1502 allele. We have also determined that the G234D substitution in Apj1CBS1502 is responsible for the xylose-positive phenotype; this glycine is conserved throughout the Saccharomyces sensu stricto and lies within Apj1’s zinc finger domain (Walsh et al. 2004).
We previously identified the XDH1 gene—which exists in some S. cerevisiae wine strains but not in laboratory strains—and found that it encodes a putative xylitol dehydrogenase and is sufficient to confer xylose utilization on a laboratory strain (Wenger et al. 2010). We tested for its presence in the other 38 xylose-positive strains identified in our original screen and showed by PCR that it is present in CBS1502 (Wenger et al. 2010). We have mapped XDH1 in CBS1502 to the right end of chromosome IX (data not shown); this is striking in the context of the array comparative genomic hybridization data, which show a loss of S. uvarum sequence in this same location (Figure S1, black circled region), possibly suggesting that the XDH1-containing region of the CBS1502 genome introgressed from another species and replaced that portion of the S. uvarum genome. Sanger sequencing of the XDH1 locus from CBS1502 revealed that this gene’s DNA sequence is identical to the XDH1 gene identified in various wine strains of S. cerevisiae (Wenger et al. 2010), suggesting that this sequence is identical by descent in CBS1502 and other S. cerevisiae strains that contain this region (Novo et al. 2009). Surprisingly, the presence or absence of XDH1 has no effect on growth in xylose in CBS1502 progeny that contain GRE3CBS1502 and APJ1CBS1502 (data not shown). This suggests that there are other functional xylitol dehydrogenases encoded by the S. uvarum genome.
One drawback of our method to genetically analyze an otherwise intractable strain is that the BSA resulted in a large range of interval sizes for the identified loci, from as narrow as 10 kb to as large as 76 kb. This disparity in size of genomic intervals reinforces the notion that achieving specificity in BSA requires high meiotic recombination rates. The small pool size derived from GSY2694 (21 each of xylose positives and negatives), combined with potential recombination problems such as the possible low sequence similarity, or the presence of inversions or translocations between the strains used, is likely responsible for the large interval sizes. These results suggest that adapting a strategy similar to X-QTL (Ehrenreich et al. 2010)—in which very large numbers of segregants are selected for opposite, extreme phenotypes—might be useful in cases such as this. Alternatively, or perhaps in combination, multiple rounds of segregation could also be useful in decreasing interval size (Parts et al. 2011).
This drawback notwithstanding, our BSA screen for loci associated with xylose growth identified the APJ1 gene, a gene with no previously known connection to xylose metabolism. This demonstrates that the study of natural Saccharomyces xylose-utilizing yeasts still offers new discoveries for the improvement of currently existing, genetically modified S. cerevisiae xylose-fermenting strains. Identifying and understanding the genetic basis of novel xylose-metabolism phenotypes can uncover new enzymes or enzyme variants in the canonical xylose pathway or in other aspects of metabolism or cell biology that are important in xylose utilization, and modifications in these genes or pathways may help move these strains into industrial use.
The authors thank Yuya Kobayashi, Dan Kvitek, and Sasha Levy for critical reading of this manuscript and Travis Maures for assistance with RT-qPCR. We also thank Phil Lacroute and Ghia Euskirchen at the Stanford Center for Genomics and Personalized Medicine for high-throughput sequencing. This project was supported by the Stanford Global Climate and Energy Project (GCEP) (grant 33450) as well as the National Institutes of Health– National Institute of General Medical Sciences (NIH-NIGMS) Genetics & Developmental Biology Training Program (NIH GM007790).
Illumina data from this article have been deposited with the Short Read Archive (NCBI) under accession no. SRA045682.1.
Communicating Editor: J. Boeke
- Received October 1, 2011.
- Accepted February 26, 2012.
- Copyright © 2012 by the Genetics Society of America
Available freely online through the author-supported open access option.