Abstract
The yeast deletion collections comprise >21,000 mutant strains that carry precise start-to-stop deletions of ∼6000 open reading frames. This collection includes heterozygous and homozygous diploids, and haploids of both MATa and MATα mating types. The yeast deletion collection, or yeast knockout (YKO) set, represents the first and only complete, systematically constructed deletion collection available for any organism. Conceived during the Saccharomyces cerevisiae sequencing project, work on the project began in 1998 and was completed in 2002. The YKO strains have been used in numerous laboratories in >1000 genome-wide screens. This landmark genome project has inspired development of numerous genome-wide technologies in organisms from yeast to man. Notable spinoff technologies include synthetic genetic array and HIPHOP chemogenomics. In this retrospective, we briefly describe the yeast deletion project and some of its most noteworthy biological contributions and the impact that these collections have had on the yeast research community and on genomics in general.
Yeast as a Model for Molecular Genetics
The yeast Saccharomyces cerevisiae has a long, illustrious history as the first domesticated organism. In the 1970s, many voices argued that yeast, specifically S. cerevisae, is well suited as a model eukaryote to expand the discoveries derived from phage and prokaryotic studies (for review see Hall and Linder 1993). The success of S. cerevisae as a model eukaryotic organism speaks for itself and has been well documented in several inspiring chapters published in GENETICS as YeastBook (Hinnebusch and Johnston 2011). In addition to providing the first complete eukaryotic genome sequence, S. cerevisiae is the only organism for which a complete deletion mutant strain collection exists. This collection has been used in a wide array of screens, and the individual strains have proved to be invaluable tools. One of the most powerful arguments for the utility of yeast as a useful model in these and other systems biology studies has come directly from the use and application of the yeast deletion collection to understand gene function, genetic interactions, and gene–environment transactions.
The concept of a yeast deletion project was inspired by the sequencing of the S. cerevisiae genome. The yeast sequencing project, one of the earliest genome consortia, served as a model for many sequencing projects that followed. Andre Goffeau, had the vision (and audacity) to suggest a sequencing project 60 times larger than any prior effort. In 1986, Goffeau, together with Steve Oliver, set up the infrastructure required to accomplish this milestone (Goffeau 2000). By the time it was completed, the network included 35 laboratories worldwide.
Two results from the yeast sequencing project had an immediate impact on the scientific community. First, despite decades of effort, most of the protein-coding genes predicted from the DNA sequence were new discoveries (i.e., not previously identified by homology or experiment) (Dujon 1996). This surprising result reinforced the ambitions of the Human Genome Project (HGP) by putting to rest many concerns over the project’s value. Second, the fact that so many yeast genes were found to be conserved across evolution validated the idea that comparative analysis of model organism genomes would help to annotate the human genome. Indeed, the evolutionary conservation of yeast genes extends to ∼1000 human disease genes, many of which exhibit “functional conservation” by their ability to complement the S. cerevisiae ortholog (Heinicke et al. 2007)
A Brief History of the Saccharomyces Genome Deletion Project
As the yeast sequencing project neared completion, assigning function to newly discovered gene sequences became a priority. As geneticists have long appreciated, an effective way to probe gene function is via mutation. Even before the yeast sequencing project was complete, creation of a genome-wide yeast mutant collection was underway in several laboratories. One effort to create a large-scale mutant collection was by transposon tagging (Burns et al. 1994; Ross-Macdonald et al. 1999). These studies included the construction of >11,000 mutants affecting nearly 2000 annotated genes that enabled large-scale systematic studies of gene expression, protein localization, and disruption phenotypes on an unprecedented scale. Importantly, the data from screens of ∼8000 strains performed in 20 different growth conditions were made widely available and established, early on, the importance of distribution of annotated screening data (Kumar et al. 2002). This pioneering study laid a foundation for all future large-scale yeast genome-wide analysis methods. It was one of the first (DeRisi et al. 1997) to introduce the concept of identifying functionally related genes by cluster analysis (Ross-Macdonald et al. 1999).
A similar large-scale mutant strategy, “genetic footprinting,” was used to generate a collection of Ty1 transposon mutants covering most of yeast chromosome V (Smith et al. 1995, 1996). This approach used competitive fitness of many strains in parallel; scoring mutant phenotypes using a PCR readout. These initial genome-scale mutant libraries provided an accurate early genome-wide snapshot of the S. cerevisiae genome, such as the observation that ∼20% of the genes are essential, and that essentiality is dependent on the experimental conditions. For example, the disruption of 39% of the genes on chromosome V resulted in a general growth defect, the magnitude of which fell along a continuum, suggesting that essential genes are better described as a spectrum rather than a binary distinction. Further, these studies identified entirely new genes. The results argued strongly against the notion that duplicated genes are redundant, as many genes of these pairs, when deleted, exhibited distinct fitness phenotypes. Transposon tagging approaches have since been employed in diverse microbes (for review see Oh and Nislow 2011). These early studies underscored the need for a complete, systematic deletion collection to (1) identify (and confirm) the essential genes, (2) achieve saturation of the genome, and (3) simplify mutant interpretation by generating complete ORF deletions.
Despite the enthusiasm for the deletion project from the yeast community, funding proved an obstacle. The funds required for the project would exceed the nonnegotiable National Institutes of Health cap of $500,000 per year. Davis and Johnston together provided a creative solution: Johnston landed a 3-year $1.26 million USD grant to construct the deletion strains; Davis secured a 3-year $1.05 million United States Dollars (USD) grant to provide the >50,000 oligonucleotides that would be required for the PCR-mediated construction of the deletion cassettes. Thus the Saccharomyces Genome Deletion Project (http://www-sequence.stanford.edu/group/yeast_deletion_project/deletions3.html) was launched (see Figure 1 for project organization). Since the yeast strain S288c was used for the sequencing project, it was clear that the same genetic background should be used for the yeast deletion collection, despite its liabilities, chief among them being its reputation for poor sporulation. The technology for creating rapid, cost-effective, and designer deletions became available with the introduction of PCR-based, microhomology-mediated recombination (Baudin et al. 1993).
(A) Saccharomyces cerevisiae Genome Deletion Project overview. The Stanford Genome Technology Center (SGTC) (yellow boxes) served as the resource for: (1) The 20-bp unique molecular barcode or tag sequence (UPTAG and DNTAG) assigned to each ORF (in collaboration with Affymetrix, peach box). (2) Automated primer picking for deletion cassette construction and deletion strain confirmation oligonucleotides (oligos) using sequencing data from the SGD (Cherry et al. 2012). (3) Primer-picking scripts were formatted for use with the automated multiplex oligonucleotide synthesizer (AMOS). (4) Resulting PCR-amplified deletion cassette modules (purple) and five premixed oligonucleotides pairs for the PCR confirmations of each strain (yellow) (A–B, A–kanB, C–D, kanC–D, and A–D) were arrayed into 96-well “6-pks” and sent to consortium members. (5) Successful deletion phenotypes and results of PCR confirmations were logged into the deletion database at the SGTC and directly made available to the yeast community by Research Genetics, SGTC, and ATCC. (6) Strains that failed to be deleted in the first round of strain construction were sent back to the SGTC for primer redesign. (B) Deletion strain strategy. Each deletion “cassette” was constructed using two sequential PCR reactions. In the first amplification, 74-bp UPTAG and 74-bp DNTAG primers amplify the KanMX gene from pFA6-kanMX4 DNA, whose KanMX expression confers dominant selection of geneticin (G418) to yeast (Wach et al. 1994). The primers consist of (5′–3′): 18 bp of genomic sequence that flank either the 5′ or 3′ end of the ORF (directly proximal and distal to the start and stop codons, respectively); 18 and 17 bp of sequence common to all gene disruptions (for amplifying the “molecular barcodes” in a PCR; (U1: 5′-GATGTCCACGAGGTCTCT-3′ or D1: 5′-CGGTGTCGGTCTCGTAG-3′); a 20-bp unique sequence (the molecular barcode TAG); and 18 and 19 bp of sequence, respectively, homologous to the KanMX4 cassette (U2: 5′- CGTACGCTGCAGGTCGAC-3′ or D2: 5′-ATCGATGAATTCGAGCTCG-3′) the other priming site for amplifying the molecular barcodes. In the second PCR reaction, two ORF-specific 45-mer oligonucleotides (UP_45 and DOWN_45) are used to extend the ORF-specific homology to 45 bp, increasing the targeting specificity during mitotic recombination of the gene disruption cassette. The presence of two tags (UPTAG and DNTAG) increases the quality of the hybridization data from the oligonucleotide arrays by adding redundancy (∼3.2% of the strains harbor only one unique UPTAG sequence). Note that in version 2.0 and subsequent collections, the two-step PCR was replaced with a single, longer primer PCR. The original length constraint was due to high error rates in longer primers, a problem that was significantly reduced by the time the V 2.0 strains were constructed. (C) Deletion strain confirmation. The correct genomic replacement of the gene with the KanMX cassette was verified in the mutants by the presence of PCR products of the expected size, using primers that span the left and right junctions of the deletion module within the genome. Four ORF-specific confirmation primers (A, B, C, and D primers) were selected for each ORF disruption. The A and D primers were positioned 200–400 bp from the start and stop codons of the gene, respectively. The B and C primers were located within the coding region of the ORF and, when used with the A or D primers, gave product sizes between 250 and 1000 bp. The KanB and KanC primers are internal to the KanMX4 module. For haploid or homozygous isolates, the junctions of the disruption were verified by amplification of genomic DNA using primers A and KanB and primers KanC and D. Deletion of the ORF was verified by the absence of a PCR product using primers A with B and C with D. In the case of heterozygous strains a successful deletion was indicated by the appearance of an additional, wild-type-sized PCR product in reactions A with B, C with D, and A with D. Each deletion mutant was checked for a PCR product of the proper size using the primers flanking the gene. In addition, each strain background was checked for the appropriate auxotrophic markers and mating type. The rigorous strain verification used in the deletion project is unfortunately not the norm. Formally, the five confirmations were required for confirmation; this was reduced to three when the long A–D PCR product proved problematic, with many groups verifying only the upstream and downstream KanMX-genomic junctions, omitting the A and D reactions that verify both the presence of the deletion and, equally importantly, that confirm the absence of the wild-type allele.
Production of each deletion cassette required two PCR amplifications. To avoid a mix-up between the first and second PCR amplification steps that could assign the wrong barcodes to the designated ORF, the oligonucleotide primers used in the first PCR included partial homology to the intended ORF to unambiguously link it to the barcode (Figure 1). The strategy of “round 1” was to proceed as quickly as possible, flagging problematic strains for closer investigation in “round 2.” Rounds 1 and 2 resulted in 92 and 74% success, respectively. “Round 3” used primers that increased the length of homology of sequences flanking the ORF to be deleted, resulting in a success rate of >97%. Each deletion mutant was considered verified if it passed three of five PCR tests to confirm replacement of the gene with the KanMX cassette at the correct location in the genome (Giaever et al. 2002; Chu and Davis 2008) (Figure 1).
Once the first few hundred strains had been constructed, several obstacles became apparent. Some of these were realized early enough in the project to allow for a course correction, while others were realized only much later. With the acceptance of microarrays as a lasting technology, the barcodes proved to be a powerful feature for functional genomics. Assays using parallel fitness largely eliminated the variation observed in individual colony assessment, and the steady decline in the cost of barcode quantification made this approach increasingly accessible. Early improvements included the addition of a second barcode, designating them “up” tags (5′ to the KanMX cassette) and “down” tags (3′ to the KanMX cassette), providing a hedge against barcode failure (usually the result of error introduced during chemical synthesis of the oligonucleotides) (Eason et al. 2004; Smith et al. 2009).
About 6.5% of diploid transformants were found to carry recessive mutations unlinked to the gene deletion. A few of the haploid deletion mutants carried an additional wild-type copy of the gene deleted, likely due to duplication of all or part of the chromosome. These cases comprised ∼1% of the heterozygous primary transformants (estimated from a sample of ∼1300 mutants (B. Dujon, personal communication; Giaever et al. 2002). In both cases, once identified, the strains were discarded. Following exclusion of those ORFs for which unique primers could not be designed, 96.5% of the remaining annotated ORFs of 100 codons or larger were successfully disrupted. (Interestingly, of the ∼5% of yeast genes that were not successfully deleted, 62% have no known biological function.)
Issues realized later included the fact that leaving the initiation ATG of the deleted ORF intact could result in spurious translation of short ORFs. However, there have been no reported adverse effects of these “start codon scars” to date. Evidence that the auxotrophic markers common to all strains affect experimental outcome has been demonstrated in an increasing number of cases (e.g., Bauer et al. 2003; Canelas et al. 2010; Corbacho et al. 2011; Hanscho et al. 2012; Heavner et al. 2012; Hueso et al. 2012; Mulleder et al. 2012) including amino acid supplementation requirements for optimal growth of the BY series (Hanscho et al. 2012) and the fact that even growth in YPD has been observed to result in decreased biomass (Corbacho et al. 2011). These results suggest the effect of the auxotrophic mutations is nontrivial; modified deletion collections addressing this potential problem are described below.
Managing the Collection: Cautions and Caveats
Despite the desire to make the collection available to the yeast community at low cost, distribution of the YKO collections continues to be a challenge. Distribution by private companies proved problematic as several disbanded or were acquired. Currently, reliable sources of the collection are Euroscarf (http://web.uni-frankfurt.de/fb15/mikro/euroscarf/) and the American Type Culture Collection (ATCC) (http://www.atcc.org/). At the Stanford Genome Technology Center (SGTC), Angela Chu curated the YKO collections since its completion, fielding many questions and complaints. With help from Mike Snyder and Guri Giaever, she updated the deletion collections to their current version 2.0. Smaller ORFS, fused ORFs, and “bad strains” were corrected in this collection of ∼300 additional strains, which also includes deletion mutants of several difficult-to-delete ORFs that were passed over in the initial collection. The original website, as of this writing is still active at http://www-sequence.stanford.edu/group/yeast_deletion_project/deletions3.html. The version 2.0 supplement also includes deletions of several small ORFs that missed the 100 amino acid cutoff in the original project (smORFs, see Basrai et al. 1997) This deletion collection update is complemented by the Affymetrix TAG4 array (Pierce et al. 2006, 2007), which incorporates many of the changes to the barcode sequences in repaired strains and provides standardized protocols for screens of pooled mutants.
There are a number of caveats specific to the deletion collections to bear in mind. For example, the collection presents the challenge of working with slow growing (“sick”) mutants that, without special attention, will become depleted before a screen begins. Other examples of bad YKO strain behavior have been identified. The Petes lab found that 96 haploid strains, all derived from a single 96-well deletion plate (and therefore contiguous on the chromosome), carried an additional mutation in the mismatch repair gene MSH3 (Lehner et al. 2007). In 2000, Hughes observed that 8% of the haploid mutants showed some degree of aneuploidy (Hughes et al. 2000b). We have observed (using flow cytometry) that 5% of haploid mutants reproducibly diploidize, regardless of their mating type. These deletion strains are typically related to cell cycle and/or mitotic progression (e.g., SPC42, G. Brown and C. Nislow, unpublished results). Users of the yeast deletion collections can add their comments and observations on odd or unexpected behavior of strains in the collection (http://www-sequence.stanford.edu/group/yeast_deletion_project/deletions3.html). Many of the comments collected up to 2006 were incorporated into the version 2.0 supplemental collection described above. The barcodes in each strain have been resequenced, most recently at great depth (Smith et al. 2009, 2011; Gresham et al. 2011). Good yeast husbandry, e.g., minimizing the number of generations of passaging to minimize second site mutations, can reduce some of these problems. A recent study demonstrated that deletion of several ORFs can result in compensatory mutations, as would be expected if the initial deletion induced a fitness defect (Teng et al. 2013). This observation underscores that, for genome-wide screens in general and pooled screens in particular, using a population (vs. a single clone) to represent a particular deletion allele is preferable.
Many inconsistencies can be avoided by using the homozygous diploid collection. For example, <0.3% of the homozygous diploids underwent a second duplication to form tetraploids. Despite the consortium’s advocacy of using homozygous diploids to avoid secondary site mutations, the diploids have been used in <5% of the genome-wide screens. This likely reflects a combination of genetic tradition, facility of downstream genetic analysis, and the popularity of synthetic genetic array (SGA) (Tong et al. 2001).
An additional caveat is the potential effect of neighboring gene deletions on phenotype. The Kupiec lab warned that neighboring gene interference could obscure proper functional annotation (Ben-Shitrit et al. 2012). Their study outlines a worst-case scenario (Baryshnikova and Andrews 2012), but does highlight that deletion of any part of the highly compact yeast genome (with antisense transcription units, CUT, XUTs, and SUTs) cannot be considered benign until proven otherwise (e.g., Richard and Manley 2009; Xu et al. 2009)
Early Applications of the Deletion Collection
Seminal publications
The first description of the S. cerevisiae Deletion Project appeared in 1999 (Winzeler et al. 1999), when just over one-third of the deletion strains had been constructed. The major findings were that (1) 17% of genes are essential, (2) only about half were previously known, and (3) it is more likely that nonessential ORFs are homologous to another gene in the yeast genome compared to essential ORFs. The completion of the S. cerevisiae Deletion Project was announced in 2002 (Giaever et al. 2002) and reported 18.7% of 5916 ORFs as essential for growth. This study (Giaever et al. 2002) included genome-wide functional profiling of the complete homozygous deletion collection in five environmental stress conditions and in the presence of the antifungal drug nystatin (Figure 2). Notable findings included a slow-growth phenotype in rich media for 15% of viable gene deletion strains. While growth in well-characterized stress conditions encompassed genes expected to be required for growth, the majority of the genes identified had not previously been recognized as being required under these conditions. For example, fewer than half of the genes required for growth in minimal media could be assigned to well-characterized biosynthetic pathways. Metabolomics studies have filled in some of this missing pathway information, but a surprising number of gaps remain in our understanding of “basic” metabolic pathways.
Pooled chemogenomic screens of the yeast deletion collection. Fitness profiling of pooled deletion strains involves six main steps: (1) Strains are first pooled at approximately equal abundance. (2) The pool is grown competitively in the condition of choice. If a gene is required for growth under this condition, the strain carrying this deletion will grow more slowly and become underrepresented in the culture (red strain). Resistant strains will grow faster and become overrepresented (blue strain). (3) Genomic DNA is isolated from cells harvested at the end of pooled growth. (4) Barcodes are amplified from the genomic DNA with universal primers in two PCRs, one for the uptags and one for the downtags. (5) PCR products are then hybridized to an array that detects the tag sequences. (6) Tag intensities for the treatment sample are compared to tag intensities for a control sample to determine the relative fitness of each strain. Here, the starting pool shown in step 1 is used as a control; steps 3–5 are not shown for this control sample.
The 2002 study (Giaever et al. 2002) also established that there is little correlation between the genes required for fitness in a condition and those whose transcription is up-regulated in that condition. This finding was quickly replicated in several reports using other environmental conditions, most notably in the presence of DNA damaging agents (Birrell et al. 2002). This finding was controversial because, at the time, up-regulation of gene expression was thought to report the requirement for a gene in the given condition. We now have a greater understanding of the complexity of the regulation of protein expression, particularly during stress, which is consistent with the lack of correlation and fitness. For example, the stress response is now known to include many post-transcriptional events that provides a more rapid response through translational reprogramming by a variety of mechanisms, including upstream ORFs (uORFs) (Ingolia et al. 2009), stress granules (Kedersha and Anderson 2009; Lui et al. 2010), and active blocking of the exit of ribosomal subunits through the nuclear pore (Altmann and Linder 2010). The relationship between fitness and gene expression is an active area of exploration (Berry et al. 2011), and in addition, has brought into question the very definition of an open reading frame. The deletion project successfully deleted 96.5% of ORFs attempted that were defined by a stretch of DNA that could potentially encode a protein of ≥100 amino acids. Today ORFs are defined as those with experimental evidence that a gene product is produced in S. cerevisiae (Cherry et al. 2012). By that definition, 4924 (97%) of the 5069 currently verified ORFs have been deleted in the existing YKO collection. A total of 120 verified ORFs are <100 amino acids and therefore are not part of the collection.
Combined, the 1999 (Winzeler et al. 1999) and 2002 (Giaever et al. 2002) articles from deletion consortium have been cited ∼2500 times (Figure 3), demonstrating that the YKO collection has become a universal resource (see Discussion below). Excluding many published reviews resulting from the excitement of the postgenomic era, the earliest data-centric citations of YKO articles were primarily comprised of new genome-wide methods in yeast and other organisms and only a few years later were dominated by publications that employed the collection in large-scale phenotypic screens (Figure 3).
(A) Network of citations of the Winzeler et al. (1999) and Giaever et al. (2002) deletion project publications. A total of 428 publications cite both publications (nodes with two edges). Of the ∼2200 (2231) total unique citations, ∼900 (864) cite Giaever et al., ∼1600 (1584) cite Winzeler et al., and ∼400 (428) cite both. Node size reflects number of citations per publication. Triangular node shape, self-citations; 126 publications. (B) Citations per year classified by article or review. (C) Distribution of 205 large-scale phenotypic assays using the deletion collection. The top pie chart depicts the six primary categories of yeast deletion collection screen by type; regardless of the particular method used (e.g., colony size, pooled screen). The three most common screen types are those that interrogate biological processes (39%, blue), drug/small molecule screens (31%, red), and environmental screes (19%, yellow). These three categories are further subdivided in the three bottom pie charts.
The completion of the YKO collection inspired the construction of many other yeast genome-wide libraries (Figure 3) as well as novel genome-wide techniques. One natural extension of the original YKO papers was proteomics studies. For example, publications that reference the YKO collections and have themselves been cited >1000 times (Figure 3) include the yeast tandem affinity purification (TAP-tagged) collection (Krogan et al. 2006), the GFP collection (Ghaemmaghami et al. 2003), and genome-scale two-hybrid studies (Ito et al. 2001). Other highly cited articles inspired by the YKO resource include novel methods for mutant construction in other organisms [e.g., Arabidopsis thaliana (Alonso et al. 2003) and Escherichia coli mutant collections (Baba et al. 2006), new technologies (genome-scale protein-complex mass spectrometry, protein microarrays (Zhu et al. 2001), digenetic interactions by SGA (Tong et al. 2001), and large-scale expression studies (Hughes et al. 2000a)].
Genome-wide phenotypic screens
The YKO collection has been used in a wide array of genome-wide phenotypic assays aimed toward increased understanding of biological function, response to stress, and mechanism of drug action. Though many screens have been repeated, with DNA metabolism and repair screens being a prominent example (Figure 3), it is beyond the scope of this chapter to review these screening results; the reader is referred to a comprehensive review article (Mira et al. 2010). A key YKO-derived technology—SGA (Tong et al. 2001) and similar assays are covered in another chapter of YeastBook (B. Andrews and C. Boone, unpublished data). Here we provide an overview using a gene-ontology (GO)-based summary of the genes identified through use of the YKO collection and highlight several examples.
Guided by the Saccharomyces Genome Database (SGD)-curated references (defined as large-scale phenotypic screens as of May 2013), 205 articles were annotated into five distinct categories. The distribution of GO terms in these categories is shown in Figure 3. GO-based enrichments in the three major subtypes of YKO assays (biological process/function, environmental stress, and small molecule/drug) were obtained by extracting the genes associated with each of the articles. Combined, these screens include phenotypic annotations of 3489 unique genes. Individual categories ascribed to (1) biological process/function, (2) environmental stress, and (3) small molecule/drug by at least two publications included 721, 379, and 233 genes, respectively. The frequencies with which these genes are cited make it difficult to accurately calculate GO enrichments, but some trends are clear. The gene list points to pH and intracellular activity as dominating factors in biological processes and response to stress, while genes comprising the drug category are most obviously associated with DNA repair, in agreement with the bias of the literature focus on DNA damaging agents (data not shown).
Early genome-wide screens
Several early applications of the yeast deletion collection are distinguished by their experimental rigor, and they set a high standard for subsequent studies. The first application of the homozygous mutant collection by the Snyder lab (a famously productive YKO consortium laboratory) was to identify genes that contribute to bud site selection (Ni and Snyder 2001). All homozygous strains in the genome were individually examined for deviation in bud site selection during cell division, identifying 127 (3%) homozygous deletion strains that reproducibly displayed altered budding patterns. Perhaps surprisingly, 105 (82.8%) of these budding pattern mutants had been previously characterized (i.e., they had three-letter gene names), though most had not been associated with abnormal budding pattern. Twenty-two (17.4%) of these were completely uncharacterized and named BUD13–BUD32. This first application of the homozygous deletion collection was comprehensive in that it integrated the analysis of mutants previously considered to serve roles independent of budding that on closer inspection revealed involvement in budding pattern. For example, the clathrin coat mutant clc1Δ revealed an abnormal Bud8 localization pattern, suggesting requirement for coated vesicles for budding formation. Furthermore, the study estimated a false negative discovery rate of ∼10%.
Another early application of the YKO collection identified genes required for resistance to K1 killer toxin, encoded by the LA double-stranded RNA (dsRNA) of S. cerevisiae (Page et al. 2003). Deletion strains with alterations in response to K1 killer toxin were expected to identify genes important for cell wall synthesis and regulation. Again, a large proportion of the genes required for toxin resistance (226, 84.3%) had gene names, but only 15 (5.6%) had been previously associated with toxin sensitivity; 42 (15.7%) were of unknown function. A total of 42 deletions caused phenotypes when heterozygous but not in haploids or homozygous diploids (all but one were in essential genes). The phenotypes in haploids were nearly identical to those in homozygous mutants, a comforting confirmation of the high quality of the deletion collections.
These two careful studies revealed several important points that were to eventually become widely appreciated and accepted aspects of any genome-wide study. First was the discovery of many genes not previously associated with these extensively studied phenotypes, underscoring the bias, insensitivity, and low coverage of standard genetic assays. Second, many genes annotated to a single function were revealed as multifunctional, and involved in several, often diverse cellular processes.
Comparing genome-wide studies between laboratories:
Mitochondrial respiration as a case study:
Several genome-wide screens aimed to identify genes required for respiration (Dimmer et al. 2002; Luban et al. 2005; Merz and Westermann 2009) served to highlight the many sources of variability that complicate comparison of results from different genome-wide screens. The most recent of these studies (Merz and Westermann 2009) revealed an overlap of 176 genes between three colony-based studies, representing approximately half of the genes identified in each individual screen. Interestingly, the 176 genes uncovered in all three plate-based screens had been identified (Steinmetz et al. 2002) in a competitive fitness assay. A retrospective GO analysis of the ∼300 genes identified only by Steinmetz et al. (2002) revealed strong enrichment for genes involved in mitochondrial respiration (P-value <10E-07) and translation (P-value <10E-11), suggesting that barcode analysis is more sensitive than plate-based assays. Regardless of the cause of the discrepancies between studies, the past decade has made clear the sensitivity of genome-wide technologies to subtle changes in laboratory conditions. It has become increasingly clear that the results generated from genome-wide screens will require individual validation by techniques that are transferable between laboratories. The lack of comparability between individual screens is not necessarily an error that can be remedied. Rather, it is a reminder that the key features to consider when comparing genome-wide data are the genes and biological processes and pathways that can be individually confirmed across platforms rather than the particular rank order of different gene lists.
Metrics to assess deletion strain fitness:
Because of improvements in data analysis made over the last decade, comparisons between experimental platforms can be more accurately assessed. In general, phenotypic screens fall into three broad categories: (1) pinning mutants onto solid media and measuring colony size, (2) determining growth curves in liquid media in 96-well plates, and (3) competitive growth in liquid culture of pooled mutants measuring barcode abundance. Each of these approaches requires bioinformatics solutions tailored to specific errors generated by each platform. Because most screens have used pinning cells on solid media, much attention has been paid to generating robust and reproducible measurements of fitness based on colony size. For example, time-lapse imaging focusing on DNA damaging agents revealed nonadditive interacting effects between gene deletions and perturbations (Hartman and Tippery 2004). Other metrics to measure gene–gene interactions were developed, including the “S score” (Collins et al. 2006) and the “SGA epsilon score” (Baryshnikova et al. 2010) that includes normalization for “batch” and “position” effects. The 96-well liquid Optical Density (OD) based growth assays underwent a similar series of improvements (Giaever et al. 2004; St Onge et al. 2007; Li et al. 2011). Several algorithms and pipelines for processing colony size data have been described, and in some cases systematically compared to liquid growth assays (Baryshnikova et al. 2010; Wagih et al. 2013). One of the most sensitive growth metrics relies on flow cytometry to monitor growth of fluorescently tagged strains, allowing fitness differences of 1% to be reproducibly discerned (Breslow et al. 2008). However, this technique is low throughput.
Analysis of pooled screens using microarrays underwent a similar evolution to respond to changes in platform and the development of microarray algorithms in general (Schena et al. 1995; Shoemaker et al. 1996; Lum et al. 2004). The most recent Affymetrix Tag4 array (∼100,000 features; size = 8 μM2) (Pierce et al. 2006) is available at much lower cost than the original platform and includes optimized methodology and web-based analytical tools (Pierce et al. 2007). Despite its availability and support, the Tag4 array has not been widely adopted (examples that have used it include Ericson et al. 2010; North and Vulpe 2010; Zakrzewska et al. 2011), and it will likely be subsumed by next generation DNA sequencing (NGS) methods (Smith et al. 2012). Application of next generation sequencing to competitive fitness assays provides a direct count of the number of barcodes present in each sample and thus avoids saturation and nonlinear effects of microarrays. The dramatic increase in throughput made possible by NGS demands additional experimental and informatic design. For example, the preparation of each sample in a 100-multiplex experiment will affect the data quality of that single experiment as well as all its multiplexed companions.
Large-Scale Phenotypic Screens
The following section highlights several applications of the YKO collection and is not intended to be comprehensive.
Cell growth
A screen of the heterozygous and homozygous deletion collections revealed that ∼3% of genes are haploinsufficient in rich media (Deutschbauer et al. 2005). This result has important ramifications for pooled screens of the heterozygous deletion collection as 97% of all heterozygotes show no detectable phenotype without perturbation, and that extending the number of generations of growth realizes a greater sensitivity and dynamic range. The fitness defect (3–5%) of most haploinsufficient heterozygotes is approximately an order of magnitude smaller than their haploid or homozygous counterparts (10–50%). Over half of the 3% of haploinsufficient genes are functionally related or enriched for ribosomal function (Deutschbauer et al. 2005). The authors speculated that this is due to the fact that ribosomal function becomes rate limiting under conditions of rapid growth in rich media. This hypothesis was supported by the observation that many of the haploinsufficient mutants no longer manifested a growth phenotype in minimal medium, in which all strains grow more slowly. Thus, it seems that the primary basis of haploinsufficiency under ideal growth conditions is due to insufficient protein production.
Other growth/fitness assays of the deletion collection scored phenotypes such as cell size. Jorgensen et al. (2002) identified ∼500 small (whi) or large (lge) mutants, revealing a network of gene products that control cell size at “start,” the point in the cell cycle at which cells commit to the next cell cycle. This study showed the close relationship between ribosome biogenesis and cell size, mediated by the transcription factor Sfp1. An assay of cell size of all nonessential homozygous and all 1166 essential heterozygous deletion mutants identified a much smaller set of 49 genes that dramatically alter cell size, 88% of which have human homologs (Zhang et al. 2002), underscoring the remarkably high level of conservation in core cell cycle control genes.
Mating, sporulation, and germination
The first genome-wide screen for defects in sporulation and germination doubled the number of genes implicated in sporulation (Deutschbauer et al. 2002). Among these 400 genes are both positive and negative regulators, including genes involved in autophagy, carbon utilization, and transcription, as well as recombination and chromosome segregation. Comparing this phenotypic assay to previously published expression assays revealed that 16% of differentially expressed sporulation genes affect spore production, again demonstrating the frequent lack of correlation between regulation of gene expression and phenotype.
A screen for germination mutants (Kloimwieder and Winston 2011) provided an updated list of genes involved in germination and revealed two new genes not previously implicated in germination (Kloimwieder and Winston 2011), demonstrating the utility of revisiting and repeating screens performed on the deletion collection to confirm and extend previously collected datasets.
Membrane trafficking
Yeast cells rely heavily on a complex interplay of vesicle formation, transport, and recycling for maintaining cellular organization and homeostasis and for buffering its response to environmental changes. The importance of proper membrane traffic is demonstrated by the ubiquity of genes with defects in this process in results of nearly all genome-wide deletion screens. Enrichment of these mutants is particularly evident in drug and environmental perturbation screens.
Several screens focused on certain aspects of membrane and vesicle transport. A screen of the haploid nonessential deletion collection for mutants defective for endosomal transport identified the VPS55/68 sorting complex as a key player in that process (Schluter et al. 2008). A screen of the nonessential haploid mutant collection identified 87 genes required for intracellular retention of the ER chaperone Kar2, including a number of known to be involved in secretory protein modification and sorting (Copic et al. 2009).
Membrane traffic and dynamics are intimately connected with the vacuole, the functional equivalent of the mammalian lysosome. Recent studies demonstrate that this multifunctional organelle is essential for protein sorting, organelle acidification, ion homeostasis, autophagy, and response to environmental stresses. Furthermore, the vacuole provides the cell with several options for dealing with and detoxifying xenobiotics and drugs (for review, Li and Kane 2009). By way of example, the vacuolar H+ ATPase has been implicated in drug response in two large-scale studies (Parsons et al. 2004; Hillenmeyer et al. 2008).
Selected environmental stresses
Approximately 20% of the screens of the yeast deletion collection have focused on response to environmental stress, including heat shock, oxidative stress, weak acid and ionic stress, and osmotic shock (Gasch et al. 2000). Similar genomic expression patterns in response to a variety of environmental stress conditions have been observed (Gasch et al. 2000), but analogous studies with the yeast deletion collections observed the opposite, identifying genes uniquely required to resist diverse stress. The basis for the discrepancy between gene expression and genes required to resist stress remains a mystery.
Selected Drug Studies
The deletion collection was applied to drug screening, primarily with the goal of uncovering mechanisms of action. Drugs initially screened included those whose mechanism of action had already been determined, most notably rapamycin and several DNA damaging agents. As more drugs were examined it became clear that such screens provided little specific information on mechanisms of action. This likely reflects the fact that many drug targets are essential. It is possible to circumvent this limitation by screening the heterozygous collection to identify drug target candidates by a drug-induced sensitivity or haploinsufficiency phenotype, an approach known as haploinsufficiency profiling (HIP) (Giaever et al. 1999, 2004; Lum et al. 2004). This approach has been applied to identify the protein targets in numerous drugs (for review see Smith et al. 2010; Dos Santos et al. 2012). HIP has been successfully employed, particularly in industry, to reveal the targets of known and novel compounds. For example, the targets of cladosporin (lysyl-tRNA synthetase) (Hoepfner et al. 2012), argyrin B (mitochondrial elongation factor G) (Nyfeler et al. 2012), and triazolopyrimidine-sulfonamide compounds (acetolactate synthase) (Richie et al. 2013) have been confirmed to hold promise as antimalarial, antibacterial, and antifungal agents, respectively.
The nonessential deletion collections, supplemented with conditional essential mutants can also be used to infer drug mechanism by profile similarity to established drugs using a “guilt by association” approach (Parsons et al. 2004, 2006). In this approach, mechanisms are inferred from their deletion profile similarity to a set of well-established drugs. For example, tamoxifen, a breast cancer therapeutic, was found to disrupt calcium homeostasis and phosphatidylserine, which by profile similarity were also identified as targets of papuamide B, a natural product with anti-HIV activity (Parsons et al. 2006).
Nongrowth-Based and Transgenic Screens
Several dozen screens targeting key signaling molecules and modifying enzymes have generated rich datasets. For example, a screen of the nonessential haploid collection interrogated the phosphate responsive signal transduction pathway by quantitative assessment of acid phosphatase activity and identified five new genes involved in the process (Huang and O’Shea 2005). High content imaging of GFP-tagged proteins in each deletion mutant was able to associate changes in protein abundance and localization with particular mutants (Vizeacoumar et al. 2010), uncovering known and novel components involved in spindle morphogenesis and the spindle checkpoint. This study has been emulated by several others (e.g., Tkach et al. 2012; Breker et al. 2013).
One of the most exciting applications of the yeast deletion collection has been in the study of neurodegenerative disorders. A protein or metabolite known to be toxic is expressed in yeast, and deletion mutations that improve (alleviating) or exacerbate (aggravating) the growth inhibitory phenotype of a protein or metabolite known to be toxic identify genes that may illuminate these human disorders. Such an approach has been applied using alpha-synuclein implicated in Parkinson’s (Yeger-Lotem et al. 2009; Chesi et al. 2012), Huntington’s (Willingham et al. 2003), Creutzfeldt-Jakob disease, and other protein aggregation disorders (Manogaran et al. 2010; Sun et al. 2011), several of which have implicated processes such as stress granule assembly and RNA metabolism. Similar approaches have used the deletion collection to survey genes required for the life cycle of retrotransposon Ty1 (Griffith et al. 2003), Brome mosaic virus (Kushner et al. 2003), and tomato bushy stunt virus (TBSV) (Panavas et al. 2005). Other informative transgenic screens have included expression of human receptors, potassium channels (Haass et al. 2007), lipid droplets (Fei et al. 2008), and other proteins (Mattiazzi et al. 2010) that mediate toxicity in a variety of conditions.
Methodological Improvements and Variations of the Yeast Knockout Collection
The yeast deletion project had a strong technology component. For example, the 96-well oligonucleotide synthesizers were critical for producing the barcodes, and sample-tracking experience gained from the sequencing project made plate tracking and mutant inventory possible. The first 96-well transformations and growth assays grew out of the deletion project as well. For the earliest agar pinning screens, many labs relied on hand-held pin tools [from V and P scientific (http://www.vp-scientific.com/)] that, according to Chief Executive Officer Patrick Cleveland (personal communication), were originally designed for bacterial clone management. From pin tools to SGA robots to high-resolution microarrays, deletion collection screens have continually pushed the envelope of high-throughput and genomic technologies.
Smith et al. (2009) adapted the microarray-based readout of barcodes for pooled deletion screens to accommodate high-throughput sequencing data. Barcode analysis by sequencing (“Bar-seq”) counts each barcode in a complex sample (Robinson et al. 2014), and outperforms barcode detection by microarray hybridization, offering improved sensitivity, dynamic range, and greater limits of detection. The power of Bar-seq is illustrated in a screen for genes required during phosphate and leucine starvation.
A tour de force study combined deletion mutant screening and metabolomics (Cooper et al. 2010) in a high-throughput quantification of amine-containing metabolites in all nonessential yeast deletion mutants using capillary electrophoresis. Several commonalities among strains were noticed, such as deletions in ribosomal protein genes causing accumulation of lysine and lysine-related metabolites. This is an excellent example of how the yeast deletion collection can be used beyond simple fitness assays to reveal novel biology.
Several variations of the Saccharomyces deletion collections have been generated. These include the Saccharomyces Sigma 1278b deletion collection (Ryan et al. 2012) as well as the Australian wine deletion collection (http://www.awri.com.au/research_and_development/grape_and_wine_production/yeast_bacteria_and_fermentation/constructing-a-wine-yeast-gene-deletion-library/). Both of these collections were derived from the original YKO, avoiding the laborious step of generating the deletion cassettes and providing an increased amount of flanking homology that increased the frequency of legitimate recombination. In the case of the Sigma strain collection, additional phenotypes are available for analysis (e.g., invasive growth, biofilm formation, psuedohyphal growth), and novel, strain-specific essential genes were identified. Two groups have also retrofitted the original deletions to allow them to be grown in minimal media. In one version (Gibney et al. 2013), the MATa haploids were mated to a wild-type strain, followed for selection for prototrophy via SGA. Another approach transformed the original deletion collection (along with haploid DAmP alleles of essential genes) with a plasmid containing all the wild-type biosynthetic genes (Mulleder et al. 2012). Both of these collections expand the phenotypic space that can be explored and promise a richer picture of the metabolomes of these deletion strains.
Perspectives
We queried the principal investigators of the 16 deletion consortium laboratories. Several respondents saw the effort to make the deletion set readily available to the yeast community as being critical to its widespread use and to the significant impact of the project. Along with the open access nature of the project, several members commented that the limited time frame helped to accelerate the project’s completion.
In a manner analogous to how the yeast genome sequence revealed that most of the protein-coding genes had not been characterized, the full deletion collection made it possible to reveal phenotypes for nearly every gene that would likely not have been identified without a systematic and genome-wide effort (Ross-Macdonald et al. 1999; Hillenmeyer et al. 2008). These observations have underscored the necessity of full-genome systematic approaches in other organisms, including human. Moreover, the deletion collection has been a starting point for numerous large-scale “genetic-network” type studies that offer a global insight into complex genetic phenotypes.
When asked to characterize the three most significant impacts of the project, some people cited the thousands of mutant screens that have been published and the new technologies like Synthetic Genetic Array (SGA) diploid-based Synthetic Lethality Analysis on Microarrays (dSLAM), and chemical profiling that evolved from the collections. One commenter worried about the lack of a systematic warehouse of deposited screening data. In writing this chapter we explored screen-to-screen reproducibility and conclude that screen-to-screen variation is unavoidable and that consequently such a data depot would be difficult to mine. Some investigators have taken that task on themselves, e.g., by establishing public databases such as FitDB, a yeast fitness database, and DRYGIN, an SGA database. As full genome methodologies make their way up the evolutionary ladder, efforts to systemize techniques to allow integration of data from vast and expensive genome-wide efforts are undoubtedly important, but it is equally important to exercise caution when combining data from diverse sources. Once again, insight from studies of yeast provides guidance: it is unlikely that systematization alone can solve the issue of variability. As with any screen, it is clear that key findings require careful, independent followup.
Finally, many of the original 16 labs echoed the impact of the deletion project on establishing collaborations and in furthering the spirit of community and collaboration among yeast researchers.
Looking Ahead
Yeast has served as a benchmark for many large-scale biotechnology applications and platforms, and the development and application of the barcoded deletion collections are no exception. The practitioners of large-scale RNA interference screens have followed the yeast playbook closely by, for example, barcoding interfering RNAs (or leveraging the hairpins of short hairpin RNAs as barcodes) and using PCR amplified barcodes from large population screens to deduce strain or cell line abundance (Ketela et al. 2011). Beyond its pivotal role as a technological carving board, yeast genomics continues to play an important role in demonstrating the power of combining, collating, and curating large-scale datasets from a variety of “omic” approaches.
Acknowledgments
Much of the history of the yeast deletion collection was provided by the personal recollections of key deletion consortium members. In particular, we thank Mark Johnston, Angela Chu, Mike Snyder, Ronald W. Davis, and Jasper Rine. We thank Elena Lissina for comments. The funding for the yeast deletion project was provided by the National Human Genome Research Institute of the National Institutes of Health.
Footnotes
Communicating editor: B. J. Andrews
- Copyright © 2014 by the Genetics Society of America
Available freely online through the author-supported open access option.