Understanding how complex networks of genes integrate to produce dividing cells is an important goal that is limited by the difficulty in defining the function of individual genes. Current resources for the systematic identification of gene function such as siRNA libraries and collections of deletion strains are costly and organism specific. We describe here integration profiling, a novel approach to identify the function of eukaryotic genes based upon dense maps of transposon integration. As a proof of concept, we used the transposon Hermes to generate a library of 360,513 insertions in the genome of Schizosaccharomyces pombe. On average, we obtained one insertion for every 29 bp of the genome. Hermes integrated more often into nucleosome free sites and 33% of the insertions occurred in ORFs. We found that ORFs with low integration densities successfully identified the genes that are essential for cell division. Importantly, the nonessential ORFs with intermediate levels of insertion correlated with the nonessential genes that have functions required for colonies to reach full size. This finding indicates that integration profiles can measure the contribution of nonessential genes to cell division. While integration profiling succeeded in identifying genes necessary for propagation, it also has the potential to identify genes important for many other functions such as DNA repair, stress response, and meiosis.
THE accelerated rate of gene discovery in an increasing number of species has challenged the existing methods for determining the functions of genes. Traditional approaches for characterizing the function of genes rely on obtaining mutant alleles and testing them in individual experiments for phenotypes. Direct and systematic methods for evaluating gene function have been developed. Genome-wide RNAi screens of cultured cells require the synthesis, validation, and refinement of large libraries of double stranded RNAs or vectors that express double stranded RNAs. While RNAi screens have successfully identified many genes that may contribute to key functions such as the replication of human immunodeficiency virus (Brass et al. 2008; Konig et al. 2008; Zhou et al. 2008), the production of RNAi libraries is resource intensive and substantial complications exist, such as off-target effects and incomplete mRNA knockdown.
An alternate approach for characterizing gene function in haploid cells relies on targeted gene deletions. For Saccharomyces cerevisiae and Schizosaccharomyces pombe, collections of strains have been created that contain systematic deletions of the predicted coding sequences (Winzeler et al. 1999; Kim et al. 2010). These comprehensive collections of strains can be readily screened under a variety of conditions to probe the contributions of individual genes to specific processes. However, considerable effort and resources are required to generate deletion sets and once generated, it is difficult to study the deletions in combination with other mutations. Another limitation with deletion collections is the rate of erroneous deletions associated with generating strains en masse. A significant proportion of strains can retain copies of genes that were targeted for deletion (Hughes et al. 2000). Finally, information from deletion collections cannot address the function of the ORFs and noncoding RNAs that have yet to be discovered.
Here we describe integration profiling, a simple transposon-based method capable of directly probing the function of single copy sequences throughout the genome of haploid eukaryotes. Integration profiling recently became feasible with the availability of deep sequencing and the discovery of highly efficient DNA transposons that are active in a broad spectrum of eukaryotic organisms. With transposons that readily disrupt ORFs, and sequencing technology that can position high numbers of insertions, the analysis of a single culture can measure the contribution of each gene to cell division. As a proof of principle we describe a study with 360,513 independent insertions of the Hermes transposon in the genome of S. pombe. As a control for the properties of the Hermes transposase, we analyzed 1,362,743 insertions generated in vitro with purified Hermes transposase and naked S. pombe DNA. The in vivo integration achieved an average of one insertion for every 29 nucleotides of nonrepetitive genome and these insertions favored nucleosome free regions. However, 33% of the insertions disrupted ORFs and those with lower levels of integration correlated well with the genes reported to be essential while the ORFs with higher integration densities corresponded with the genes reported to be nonessential (Kim et al. 2010). Importantly, ORFs with intermediate densities of insertion correlated with genes that, while not essential, do contribute to growth. In addition, discrepancies between the genes reported to be nonessential by the deletion consortium and our integration data revealed that ∼10% of the deletion strains retain a copy of the ORFs targeted for deletion.
Materials and Methods
The S. pombe strains used for this study are listed in Table S5. The S. pombe strain YHL912 (h-, leu1-32, ura4-294) was transformed with the donor (pHL2577) and expression plasmids (pHL2578) separately by lithium acetate method. The donor plasmid was transformed into YHL912 first to create the strain YHL9451. This strain was then transformed with the expression plasmid to create YHL9609, which was used for transposition in liquid cultures. An empty expression vector (Rep81x) was introduced in place of the expression plasmid to create YHL9176 as a negative control for transposition.
Heterozygous diploid deletion S. pombe strains were purchased from the Bioneer Corporation and Korea Research Institute of Biotechnology and Bioscience (http://pombe.bioneer.co.kr/). Strains carrying a deletion in mmf1, mrpl16, mrpl19, or SPBC2d10.08c were transformed with a plasmid copy of the deleted gene (see Table S6) by lithium acetate method, followed by a second transformation with a plasmid that allows for sporulation (pHL2806). The haploid deletion strains used in this study were from the Bioneer collection version 2.0.
The donor plasmid (pHL2577) and expression plasmid (pHL2578) have been previously described and are listed in Table S6. The Ura3-marked donor plasmid consists of the KanMX6 gene cloned between the Hermes terminal inverted repeats (TIRs). The Leu-marked expression plasmid contains the Hermes transposase gene driven by the Rep81x nmt1 promoter. A detailed description of plasmids used in this work is included in Supporting Information. The oligonucleotides used in this study are listed in Table S7.
Strains were grown in liquid Edinburgh minimal media (EMM) supplemented with 2 g/liter dropout powder (all amino acids in equal weights, leaving out leucine and uracil, plus adenine in 2.5 times more weight) (Forsburg and Rhind 2006). To eliminate the Hermes donor plasmid, the final cultures were diluted in EMM supplemented with uracil, leucine, 5-fluoroorotic acid (FOA), and B1 (at 50 μg/ml, 225 μg/ml, 1 mg/ml, and 10 μM, respectively). This culture was then diluted in YES media (YE plus complete dropout mix consisting of all the amino acids) supplemented with FOA and G418 (at 1 mg/ml and 500 μg/ml, respectively) to isolate cells with insertion events.
Transposition frequency was measured by plating cells on 2% agar plates containing EMM and dropout mix supplemented with leucine (250 μg/ml), uracil (50 μg/ml), 5-FOA (1 mg/ml), and 10 μM thiamine, and also plating on YES plates supplemented with FOA and G418 at the concentrations noted above.
Pombe Minimal (PM) media used for drop assays is identical to EMM, except with 3.75 g/liter glutamic acid substituted for 5 g/liter NH4Cl.
Generating a library of Hermes insertions
A culture of YHL9609 was grown to OD600 = 5, and was used to inoculate a series of cultures at OD600 = 0.05. The serial passaged cultures were continued until the percentage of cells with integration reached 13%, or ∼80 generations of cell division. In all, it took a series of 12 sequential cultures for the strain to reach this point.
The final 50-ml culture was grown in FOA to select against the donor plasmid and then in G418 to select cells with Hermes insertions.
The protocols for preparing Hermes insertion libraries for high-throughput Illumina sequencing are extensive and are included in Supporting Information. The process includes the extraction of genomic DNA, the digestion of the DNA with MseI, the ligation of linkers to the MseI fragments, the PCR amplification of insertion sites, the gel purification of the PCR products, and the Illumina sequencing of the amplified DNA. The raw sequences for the in vitro and in vivo inserts were deposited in the Sequence Read Archive (SRA) database of GenBank with accession no. SRA043841.1.
Mapping Hermes integration sites on the genome of S. pombe
Sequence reads from Illumina were screened for those containing Hermes left end. Then the Hermes sequences were trimmed. The trimmed sequences were aligned to the S. pombe genome using the National Center for Biotechnology Information (NCBI) BLAST software (blastall) on a local computer. The BLAST results were filtered to collect matches with genomic sequence that started from the first nucleotide after the Hermes end and with identities ≥95% and expect (e) values ≤0.05. Then, of the matches that met these criteria, the one with the highest bit score was used to obtain the coordinates for the unique insertion sites (Table S4). Sequences that were found to have the same insertion coordinate and the same orientation were considered to be duplicate reads and were counted as only one independent integration event.
Matched random control
For each Hermes insertion site, the distance between the integration site and the responsible MseI site (d) was calculated. Then another MseI site coordinate (m) was randomly chosen from the S. pombe genome. Then m + d or m − d was taken as a matched random control (MRC) site. To “add” or “subtract” was also randomly determined. Thus, the MRC dataset has the same size as the experimental integration dataset and matches the distances to MseI sites.
General bioinformatic analysis and programming
The scripts for screening the raw sequences, filtering the BLAST outputs, extracting features from .embl files, determining the locations of Hermes insertion according to chromosomal features, generating MRC and other analyses were written in Perl or Ruby programming languages.
Nucleosome DNA preparation and sequencing
Nucleosomal DNA was prepared as described previously (Yamane et al. 2011), and DNA samples were sequenced by using the Illumina sequencing protocol. Bowtie (Langmead et al. 2009) was used to map the Illumina sequencing reads, trimmed to 25 bp of high quality reads, against the reference genome allowing for two mismatches. The mapped data were filtered to remove all sequences that mapped to more than a single location. The end positions of the reads were aligned relative to the center of the nucleosome by shifting the plus strand reads by +73 bp and the minus strand reads by −73 bp. The final nucleosome maps were produced by applying gaussian smoothing to the raw data to reduce noise. More detailed methodology will be described in a forthcoming manuscript on the genome-wide mapping of nucleosomes in S. pombe.
To test the viability of four representative Bioneer strains heterozygous for deletions in nuclear-encoded mitochondrial genes designated as essential, we performed a plasmid shuffle assay. The details of this process are included in Supporting Information.
Integration profiling is a method developed to determine which genes are essential for cell division. With this method, populations of cells with transposon insertions are grown for many generations. The culture becomes depleted of cells that have insertions in genes important for division. Insertion sites in genomic DNA from cells in the culture are amplified by ligation-mediated PCR and the location of the insertions are determined by deep sequencing. The integration profiles of the cultures were expected to contain high densities of insertions throughout the genome except in genes that are required for propagation. Figure 1A illustrates a model of the integration pattern expected in an integration profile.
The Hermes transposon of the housefly, Musca domestica, has high levels of integration activity in S. pombe and as much as 50% of the insertions occur in ORFs (Evertts et al. 2007; Park et al. 2009). We therefore applied the Hermes system as previously described to create an integration profile of the S. pombe genome. Briefly, the Hermes transposase was expressed from a plasmid in cells that also contained a plasmid copy of the Hermes TIRs flanking a kanMX6 gene, which confers resistance to G418 (Figure 1B) (Evertts et al. 2007; Park et al. 2009). The transposase excises kanMX6 with the TIRs and inserts this DNA into chromosomal sequences of S. pombe. Cells with insertions are selected using media containing G418 and the majority of G418-resistant cells have a single insertion (Evertts et al. 2007). To create sufficient numbers of insertions and to allow for selection of the fittest cells, the cultures expressing the transposase were grown with repeated dilutions for a total of 74 generations (Figure 1B). Quantitative measures of transposition revealed that 13.4% of the cells in the final culture contained an integrated copy of Hermes (Supporting Information, Figure S1). In the absence of transposase, no insertions were detected. The positions of the insertions were determined by ligation-mediate PCR followed by deep sequencing (Guo and Levin 2010) (and Materials and Methods section in File S1).
The data from sequencing generated 46 million high-quality sequence reads that were 50 nt in length. The first 10 nt of the reads contained the end of Hermes and the remaining 40 nt were derived from the insertion sites in chromosomal DNA. Using the homology algorithm BLAST, 27.5 million reads were matched to a unique sequence in the S. pombe genome. Because PCR amplification or cell division can create multiple copies of the same integration event, duplicate reads were omitted. The remaining data identified 360,513 independent insertions and these are the sites that we analyzed (Table 1). On average, this amount of integration constituted one insertion for every 29 nt of nonrepetitive sequence in the genome.
The insertion sites favor nucleosome-depleted positions
The integration activity of Hermes in S. cerevisiae was recently described (Gangadharan et al. 2010). In S. cerevisiae, 41% of the insertions were in ORFs and of the 59% that occurred in the intergenic regions, sites upstream of the ORFs were favored. This pattern of integration correlated with nucleosome-free regions and was attributed to greater DNA accessibility at these positions (Gangadharan et al. 2010).
To determine which properties of integration are intrinsic to the Hermes transposon, we compared the distribution of Hermes integration in S. pombe to that reported in S. cerevisiae. A map of the insertions in S. pombe revealed that 33% occurred within ORFs (Table 1 and Figure 1C). The insertion levels were higher in the intergenic regions than in the ORFs and the insertions in intergenic regions of S. pombe exhibited the same preference for sequences upstream of the ORFs that occurred in S. cerevisiae (Table 1).
To test whether the higher levels of integration in the intergenic sequences were caused solely by sequence preferences of the transposase, we generated a library of insertions in naked DNA using purified Hermes transposase in an in vitro reaction. Deep sequencing of the in vitro insertions identified 1.36 million independent integration events in S. pombe DNA (Table 1). Sixty-three percent of these inserts occurred in ORFs and the fraction of insertions in the intergenic sequences upstream of ORFs was similar to the insertions downstream (Figure 1D and Table 1). This distribution correlated well with the coding content of the S. pombe genome, which is 60.2% (Wood et al. 2002) and suggests in the in vivo experiment the integration preference for sequences upstream of ORFs was due to features of chromatin structure.
To characterize the sequence preferences of Hermes integration in S. pombe, we analyzed the nucleotide frequencies at the insertion sites. The Hermes transposase recognizes the consensus sequence nTnnnnAn (Zhou et al. 2004) and as expected, the vast majority of the insertions in S. pombe had this sequence (Figure S2A). The same nTnnnnAn was found at the sites of Hermes integration generated in vitro (Figure S2B).
Many transposons and retroviruses exhibit nucleotide preferences in the form of palindromes thought to result from contacts between the integrases and nucleotides that extend as far as 50 nt to 100 nt from the sites of insertion (Holman and Coffin 2005; Wang et al. 2007; Gangadharan et al. 2010; Guo and Levin 2010; Maertens et al. 2010). The nucleotide frequencies surrounding the insertions we generated possessed an unusual palindromic pattern of nucleotide frequencies that extended >1.5 kb from the insertion sites (Figure 2A). The sequences had a surprising oscillation of nucleotide frequencies with a wavelength of 150 bp that extended ∼500 bp from the insertion sites. These oscillating patterns of nucleotide frequencies were not observed in a set of random insertions called a MRC generated in silico that matches the distances to MseI sites of the authentic insertions (Figure 2B). Nor were they observed in the in vitro experiment (Figure S3). As a result, these oscillating preferences for A’s, T’s, G’s, and C’s reflect the in vivo context of the insertion sites and not a bias generated by the procedure used to cut the genomic DNA or contacts the transposase makes with the target DNA. The 150-bp repeat patterns suggested that the Hermes transposase was influenced by nucleosomes during integration.
Extending our analysis of nucleotide frequencies further from the insertion sites revealed a second bias consisting of a single palindromic depression in A/T content that continued a full 3 kb from the insertion sites (Figure 2C). This pattern was also absent in the MRC (Figure 2B) and in the in vitro integration data (Figure S3). Interestingly, the 3-kb nucleotide bias was stronger when just the insertions upstream of ORFs were analyzed (Figure S4). To test whether the 3-kb depression in A/T frequency resulted from a property of sequence upstream of ORFs, we analyzed the nucleotide frequencies of the subset of random insertions (MRC) that occurred upstream of ORFs (Figure 2D). The random insertions upstream of ORFs did possess a strong A/T depression that was very similar in magnitude and extent to the upstream insertions of Hermes (Figure S4). Consistent with the finding that the A/T depression was an intrinsic property of sequence upstream of ORFs, we found that the average nucleotide frequencies upstream and downstream of the S. pombe ORFs do have a lower frequency of A relative to T (Figure S5, A and B). This is an unusual genome structure that to our knowledge has not been previously reported for S. pombe and is not as extensive in S. cerevisiae (Figure S6, A and B).
The oscillating pattern of A’s, T’s, G’s, and C’s surrounding the Hermes insertions was not observed with the set of random insertions positioned upstream of ORFs, suggesting that this 150-nucleotide periodicity was mediated by regularly spaced proteins such as nucleosomes. If this insertion bias were due to integration into arrays of phased nucleosomes then the closest distances between insertions should be multiples of ∼150 bp. An analysis of all the pairs of insertions closer than 800 bp revealed that the most common distances were indeed multiples of 150 bp (Figure 3A). Such a pattern was not seen with the random controls. To test directly whether integration was influenced by nucleosomes, we generated a map of nucleosome occupancy for chromosome 3 (Methods section in File S1). An analysis of the average nucleosome occupancy extending 2 kb on either side of the insertions created in vivo revealed a highly regular oscillation with a wavelength of 150 bp (Figure 3B). The insertion sites were located at the lowest nucleosome occupancy, indicating the highest frequencies of integration were between nucleosomes. The MRC sites did exhibit oscillations but the amplitudes were substantially smaller than the in vivo insertions, indicating that the positions of MseI sites did not contribute to the strong integration bias that favored sites between nucleosomes.
The correlation between the positions of integration and oscillating nucleosome occupancy suggested the pattern was due to the position of nucleosomes. To test this possibility we examined the insertion levels at the transcription start sites (TSSs) (annotated in the Sanger Center, February 2011 chromosome contigs based on Dutrow et al. (2008) and Lantermann et al. (2010) (Figure 3C). As expected, nucleosome occupancy was very low in the regions ∼150 nt upstream of TSSs and from the start of ORFs, nucleosomes were positioned in phased arrays. Integration generated in vivo was very high in the nucleosome-free region upstream of the TSSs and from the start of ORFs the integration was reduced but oscillated in sync with the spaces between nucleosomes. This analysis supported the correlation seen in Figure 3B, that integration generated in vivo favored nucleosome-free sites (Figure 3C). This oscillating pattern was not observed in the MRC set matched to the in vivo insertions nor in the in vitro inserts.
Integration density of ORFs identify essential genes
While the nucleosomes within ORFs appeared to inhibit integration, a full 33% of all integrations did occur in coding sequence. To investigate whether the densities of the integration generated in cells could be used to discriminate between essential and nonessential genes, we graphed the insertions surrounding three known essential genes, cdc2, cdc19, and cdc25 (Figure 4, A–C). In each case, little or no integration was detected in these three essential ORFs while high levels of insertions were seen in the adjacent intergenic sequences and nonessential ORFs. This initial evaluation of integration profiling suggested that integration density could be used to identify essential genes.
We next tested whether Hermes integration could identify essential ORFs throughout the S. pombe genome by analyzing the integration densities (inserts per kilobase per million integrations, units normalized to millions of insertions per dataset so that densities from the in vitro experiment could be compared) for all annotated ORFs (Figure 5A). Two dominant groups of ORFs were observed with peak densities of 8.3 inserts/kb/million and 50 inserts/kb/million. A consortium has systematically deleted individual ORFs in S. pombe to create a set of heterozygous diploids. By sporulating the strains and monitoring the germination of spores carrying the deletions, they designated which ORFs were essential for growth (Kim et al. 2010). Using these designations, we plotted the integration densities for the nonessential and essential ORFs separately (Figure 5B). The integration densities in the essential ORFs were clearly lower than in the nonessential ORFs, indicating that the integration profiles did discriminate between essential and nonessential genes. The integration density at the peak with the maximum ORFs was 5.5 inserts/kb/million for the essential ORFs and 50 inserts/kb/million for the nonessential ORFs. These two peaks corresponded with the two maximum peaks of integration in the total group of ORFs shown in Figure 5A. We also analyzed the integration densities of a subclass of nonessential genes that, when deleted, resulted in small colonies. Importantly, these nonessential ORFs had intermediate densities of integration (Figure 5B). This indicates that the intermediate levels of integration were detected because these nonessential genes made important contributions to growth. The low levels of integration in the essential ORFs and the intermediate levels of integration in the nonessential ORFs that contribute to colony growth were not due to properties of the sequences or the transposase, since an analysis of the in vitro integration showed the essential and nonessential ORFs contained equivalent densities of integration (Figure 5C).
While the integration densities generally reflected whether an ORF was essential, there was some overlap in the distributions of the two groups (Figure 5B). For example, there were ∼50 ORFs designated by the deletion consortium as essential that had integration densities of 59.4 insertions/kb/million up to 161 insertions/kb/million, levels that suggested the ORFs were nonessential (Table S1). One possible explanation for this discordance is that these genes were actually not required for cell division but instead played a key role in spore formation or germination. Since the deletion consortium relied on lack of spore germination to identify essential genes, proteins required for spore function would be incorrectly designated as essential for growth. Interestingly, over half of these 50 disparate ORFs play an important role in mitochondrial function. In contrast to S. cerevisiae, it is thought that S. pombe requires mitochondrial DNA and many nuclear-encoded mitochondrial genes for viability even when fermentable carbon sources are available (Haffter and Fox 1992; Kim et al. 2010). However, the possibility existed that these nuclear-encoded mitochondrial genes of S. pombe are required for spore germination, not cell division.
We tested directly whether four representative nuclear-encoded mitochondrial genes from Table S1 were required for cell division with a “plasmid shuffle” method that uses a haploid strain containing a plasmid-encoded candidate gene but lacking the chromosomal copy of the same gene. If the strain grows on medium containing 5-FOA, a compound that selects for plasmid loss, then the gene is nonessential (Sikorski and Boeke 1991; Ben-Aroya et al. 2008). The genes that we tested encoded mitochondrial ribosomal proteins (mrpl16, mrpl19, and SPBC2D10.08c) and a factor involved in mitochondrial DNA maintenance (mmf1). Each of these four ORFs had high numbers of insertions (79.4 to 128 insertions/kb/million) but were designated as essential by the consortium. In each case, when the strains carrying plasmid copies of the genes were diluted onto medium containing 5-FOA, no growth occurred, indicating that these genes were truly essential (Figure 6A, right panel). In comparison, when a strain lacking SPBC21C3.09c, a nonessential gene, was diluted on the same plate, cells grew readily, indicating they did not require a plasmid encoding SPBC21C3.09c. Surprisingly, when patches of the same deletion strains were replica printed from rich medium directly onto medium containing 5-FOA, strains lacking the mitochondrial genes showed strong levels of growth while equivalent strains lacking a functional Cdc19p, an essential protein, did not grow (Figure 6B). Taken together with the high numbers of insertions in these ORFs, these data indicate that while we found the nuclear-encoded mitochondrial genes are truly essential for division, the pools of the proteins and the number of mitochondria that segregate during mitosis are sufficient to support many cycles of cell division after their genes were disrupted. This result suggests that the 50 ORFs identified in Table S1 represent a baseline of genes that produce pools of protein that were too large to be depleted during the growth of our cultures.
Ten percent of the strains in the deletion collection retain a copy of the genes that were targeted for deletion
Another aspect of the Hermes integration data that is discordant with the designations of the deletion consortium is that among the genes reported by the consortium to be nonessential, there were 220 ORFs that had integration densities below the average for the essential ORFs (19.7 inserts/kb/million). This low amount of Hermes integration suggests these genes are actually essential for viability (Table S2).
Some of these ORFs encode highly conserved proteins such as eIF2 gamma, eIF6, and Pub1p that are known to be essential in other eukaryotes. One explanation for the low integration in these ORFs is that they are essential for cell division and in the process of deleting the genes or in the subsequent step of sporulating the heterozyogous diploids, chromosomal rearrangements occurred that produced haploid strains with the desired deletion and an ectopic copy of the ORF that had been deleted. Strong selection for these events would exist if the ORFs were essential. Just such a set of processes occurred with the S. cerevisiae deletion set where a full 8% of 300 strains tested were found to retain an intact copy of the deleted ORFs either in aneuploid or rearranged chromosomes (Hughes et al. 2000).
To test whether the strains in the S. pombe deletion set retained copies of ORFs thought to be deleted, we assayed for the presence of these ORF sequences with PCR. For these experiments, we chose a representative set of 77 ORFs reported by the consortium to be nonessential and that had increasing integration densities from 0 to 42 inserts/kb/million (Table S3). We generated two pairs of primers to amplify different segments of each of these nonessential ORFs (Figure 7A). We found that 31 (40%, counting deletions producing the same number of PCR bands as wild type, wt) of the haploid strains tested retained ORF sequences reported to be deleted (Table S3). In addition, when the ORFs tested were grouped by the amounts of Hermes integration they had, a clear trend was observed. The ORFs with lower Hermes integration densities were much more likely to be retained in the deletion strains (Figure 7B). Of the 19 representative ORFs tested that had <5.5 inserts/kb/million integrations, 11 (58%) were still present in their respective deletion strains. The representative ORFs with greater levels of integration were more likely to be absent in their deletion strains. For example, the 10 representative ORFs with 41–42 insertions/kb/million insertions were all deleted successfully. To check whether our PCR reactions might have spuriously detected ORF sequences due to contaminating DNA, we used DNA blots and tested the presence of a sampling of ORFs. Six ORFs that had varying densities of Hermes integration and that we found by PCR to be present in the deletion strains were probed on DNA blots. All six ORFs were clearly present in the deletion strains in which they had been reported to be deleted (Table S3). Given that DNA contamination was not a significant problem with our PCR assays, we used the data in Figure 7B and the levels of Hermes integration to estimate the number of strains in the deletion collection that have been incorrectly reported to lack the specified ORFs. Based on a linear regression of integration densities of ORFs designated to be nonessential with 0–42 inserts/kb/million integrations, we estimate that ∼300 (10%) of the deletion strains designated nonessential are actually essential and contain the ORFs said to be deleted.
We described here integration profiling, a transposon-based technique that relies on integration densities to discriminate between the essential and nonessential genes of S. pombe. The 360,513 independent sites of in vivo integration resulted in an average of one insertion for every 29 nucleotides of nonrepetitive genome. Essential genes accumulated many fewer insertions than nonessential genes and importantly, nonessential genes that contribute to colony growth accumulated intermediated densities of insertions.
A total of 67% of the in vivo insertions occurred in intergenic sequences. This fraction represents a high level of enrichment relative to 39.8%, the intergenic proportion of the nonrepetitive genome (Wood et al. 2002). In comparison, 37.0% of the insertions generated in vitro occurred in intergenic sequence, a fraction close to the intergenic portion of the nonrepetitive genome. This indicates the high level of intergenic integration that occurred in vivo was not due to the selection of these sequences by the transposase itself. It is formally possible that the in vivo insertions were detected in the intergenic regions because integration in ORFs was more likely to be detrimental. However, our data indicate that Hermes integrated more often in intergenic regions because these sequences lack nucleosomes, and sequences with low nucleosome occupancy were favored targets of integration. The nucleotide frequencies of sequences flanking insertion sites in S. pombe exhibited an unusual oscillation of A/T content with a wavelength of 150 bp (Figure 2A). Insertion numbers peaked in spaces between positioned nucleosomes and in the nucleosome-free regions at the TSSs upstream of ORFs (Figure 3, B and C). We suggest this pattern arises because nucleosomes occlude Hermes integration. A similar proposal was made for Hermes integration in S. cerevisiae where insertions were strongly associated with nucleosome-free regions (Gangadharan et al. 2010).
The nucleotide content downstream of the insertions generated in vivo exhibited an unusual bias favoring T that extended as far as 3000 bp (Figure 2C). The same 3000-bp bias was observed when we analyzed the MRC set that occurred in regions upstream of ORFs. This caused us to evaluate the A/T content of sequences upstream of ORFs and led to the surprising finding that for >500 bp upstream of ORFs the average nucleotide frequency of T is higher than A (Figure S5A). This appears to be an unusual structure for a genome as it did not exist in S. cerevisiae or in the genomes of other organisms we examined (Figure S6A and data not shown).
The high activity of Hermes in S. pombe and its ability to disrupt ORFs resulted in integration densities that were sufficient in most cases to distinguish between essential and nonessential ORFs. Recently published experiments with prokaryotic transposons demonstrated that dense integration maps can identify essential genes of bacteria (Gawronski et al. 2009; Langridge et al. 2009; van Opijnen et al. 2009; Christen et al. 2011). It is our application of the Hermes transposon that now makes this approach possible in a eukaryotic system.
After 80 generations of cell division, the pool of S. pombe cells with a Hermes insertion contained few disruptions of essential ORFs. However, there was a set of 50 ORFs with high densities of integration that were designated essential. In a sample of four of these genes, we confirmed their essential status (Figure 6A). We noted that many genes in this set were important for mitochondrial function (Table S1). The high amount of mitochondrial protein in cells suggested the possibility that these essential ORFs had many insertions because after these genes were disrupted, cells grew for many generations before their gene product was depleted. This idea was supported by our study of four of these deletions where we found that strains grew for many more generations after these essential genes were removed (Figure 6B). While large pools of proteins make it difficult to detect their essential function using integration profiling, the number of genes expressing such large pools is a small percentage of the entire gene set of S. pombe.
Integration profiling provided a means of determining which genes are essential, independent of the deletion method used by the consortium (Kim et al. 2010). Of the ORFs that had low densities of integration, we found a surprising number had been designated nonessential by the consortium. PCR and/or DNA blot assays of 77 ORFs revealed that at least 40% of these strains retained a copy of the ORF that was targeted for deletion. This result and the finding that many of these genes are known to be essential in other organisms led us to conclude that these ORFs with low integration densities are essential. We found that ORFs targeted for deletion were more likely to be retained if the ORF had low amounts of Hermes integration. Using a linear regression of the integration densities we project that ∼300 ORFs thought to be nonessential were not successfully deleted in the deletion collection. The consortium determined which genes are essential by deleting one copy of an ORF in a diploid and following sporulation, testing whether haploids with the deletion are viable. Deletion of an essential ORF creates strong selection for suppressor mutations or chromosomal rearrangements that produce an ectopic copy of the gene. We believe this has happened with many of the deletion strains either in the haploid, the diploid, or during meiosis. An analysis of the S. cerevisiae collection of deletions revealed that ∼8% of the genes deleted were nevertheless present in aneuploid or rearranged chromosomes (Hughes et al. 2000). Since these types of genetic alterations led to mistaken predictions of essentiality in both the S. cerevisisae and S. pombe deletion sets it is clear that there is a need for an independent method for establishing which genes are essential. Integration profiling is such an independent method for reliably identifying essential genes. One significant advantage of integration profiling is that it also can identify which nonessential genes make important contributions to growth. The intermediate integration levels of ORFs that, while not essential, contributed significantly to colony growth indicates that integration densities can be a measure of function even for nonessential genes. As a result, the capacity of integration profiling to document intermediate contributions to growth allows for a more accurate estimate of gene function than the designations of essential and nonessential.
Although we have validated this application of integration profiling under ideal growth conditions, it is clear that the method can be readily adapted to measure gene contribution to a wide variety of processes such as responses to environmental stress, repair of DNA damage, or viability during long periods of quiescence. Integration profiling can also be applied to identify gene interactions in a “synthetic lethal” approach by conducting integration in strains that have mutations of interest. As sequencing technology continues to improve, integration profiling will also have applications in organisms with more complex genomes, in the identification of genes that can be haploinsufficient or dominant negative. Such approaches in cultured cells will have important applications in the identification of disease pathways and physiological systems.
We thank Jacqueline Hayles for kindly sharing data on the colony size of the strains with gene deletions. This research was supported by the Intramural Research Program of the National Institutes of Health (NIH) from the Eunice Kennedy Shriver National Institute of Child Health and Human Development and by the Intramural Program of the National Cancer Institute. The work by S.G. and N.L.C. was supported by NIH grant RO1GM076425.
Communicating editor: D. Voytas
- Received April 30, 2013.
- Accepted July 20, 2013.
- Copyright © 2013 by the Genetics Society of America