RNA-Guided Nucleases: A New Era for Engineering the Genomes of Model and Nonmodel Organisms
Kent G. Golic

In 1946, H. J. Muller was awarded the Nobel Prize in Medicine for his discovery with Drosophila melanogaster, some 20 years earlier, that exposure to X-rays caused mutation (Muller 1927). This led to the identification of a large number of Drosophila mutants and chromosome rearrangements, many of which are still frequently used in fly labs. The discovery of an effective method to induce mutations made it feasible to undertake genetic studies of basic biological processes. Beadle and Tatum used X-ray mutagenesis of Neurospora crassa to identify metabolic mutations and prove the “oone gene-one enzyme” hypothesis (Beadle and Tatum 1941). Subsequently, Auerbach and Robson discovered the mutagenic properties of mustard gas (Auerbach and Robson 1946), and others found additional mutagenic chemicals (Beale 1993).

Though the methods of X-ray and chemical mutagenesis have been invaluable, their effects cannot be directed to specific genes, or to genes that control specific phenotypes. Genetic screening schemes or selections must be employed to find the mutants of interest. For instance, Beadle and Tatum screened for cultures that would grow on rich medium but not on minimal medium. This served to identify mutants that were deficient in basic metabolic processes, but was very labor intensive.

Since the advent of DNA sequencing, biologists have sought ways to produce mutations in chosen sequences. Yeast researchers were the first to achieve this with the discovery that exogenous DNA could integrate into a chromosome by homologous recombination (Hinnen et al. 1978). This quickly led to the realization that, by appropriate design of the donor DNA, predetermined changes could be introduced into the chromosomal DNA sequence (Scherer and Davis 1979). An especially significant advance for the future of gene targeting was the discovery that homologous recombination between the donor and the chromosomal target sequence could be greatly stimulated by cutting the donor molecule with a restriction enzyme (Orr-Weaver et al. 1981). The use of linearized donor DNA became part of the standard protocol for gene targeting in the mouse (Mansour et al. 1988), and much later, in Drosophila (Rong and Golic 2000; Rong et al. 2002; Gong and Golic 2003).

In spite of these successes, gene targeting generally remained a lengthy and involved process in organisms other than yeast. However, there were indications that homologous recombination could be made much more efficient if a double-strand break were introduced at the target locus rather than in the donor DNA segment. For instance, mating type switching in yeast is initiated when the homing (HO) endonuclease makes a double-strand cut at the MAT locus (Strathern et al.1982; Kostriken et al. 1983). The gene conversion that follows, in which homologous sequences from HML or HMR are copied into MAT, is extremely efficient (Hicks and Herskowitz 1976). Breaks induced by the I-SceI homing endonuclease also induce high rates of homogolous recombination in yeast (Plessis et al. 1992).

In Drosophila, double-strand breaks produced by P transposon excision or I-SceI expression can be repaired by gene conversion at levels far higher than have been achieved by standard gene targeting procedures (Gloor et al. 1991; Johnson-Schlitz and Engels 1993; Rong and Golic 2003). In addition, such breaks can be repaired by copying information from engineered transgenes (Nassif et al. 1994), or from oligonucleotides (Banga and Boyd 1992) or plasmids (Keeler et al. 1996) injected into embryos. In mammalian cells, also, homologous recombination between introduced donor DNA and resident chromosomal DNA is greatly stimulated by breaks in the chromosomal sequence (Rouet et al. 1994; Choulika et al. 1995; Donoho et al. 1998).

These results suggested that if such breaks could be produced at desired locations, they might form the basis of an efficient gene targeting system. However, the problem was how to target the break to a specific location. The results from flies and mammalian cells were all based on “foreign” sequences that had been introduced to a specific site—either chance P transposon insertions, or target sites for rare-cutting homing endonucleases. What was needed was some way to generate a double-strand break in any chosen gene, in situ, without the necessity of its prior modification. Zinc finger nucleases (ZFNs) provided a possible solution. ZFNs are synthetic proteins, typically consisting of three or more zinc finger DNA binding domains coupled to the endonuclease domain of the FokI restriction enzyme (Kim et al. 1996). Each domain typically binds a nucleotide triplet and can be engineered to recognize a variety of sequences. When two ZFNs are designed to recognize opposite strands separated by 6 bp, the FokI domains can dimerize across the gap and cleave the intervening DNA.

Heritable genome modification with ZFNs was first achieved in D. melanogaster. When ZFNs designed to recognize a site in the yellow gene were expressed from transgenes, somatic and germline mutations of yellow were produced at impressively high rates, with inherited mutations occurring at a frequency of ∼0.5% (Bibikova et al. 2002). Since then, mutation rates on the order of 20–30% have been reported for some target genes (Beumer et al. 2006). The recovered mutations showed characteristics of breaks that had been repaired by nonhomologous end joining (NHEJ), consisting mostly of small deletions and insertions. Because of its high efficiency, new mutants can often be identified by molecular techniques without requiring any other phenotype.

A time-saving advance for Drosophila was the demonstration that injection of ZFN mRNAs into embryos can execute targeted cleavage, bypassing the need to produce animals with ZFN transgenes (Beumer et al. 2008). Co-injected donor DNA templates could also be incorporated into the cut site by gene conversion. Thus, mutations could be produced in a chosen gene in a single step.

A variant of the ZFN technology was provided by transcription activator-like effector nucleases (TALENs); Christian et al. 2010; Li et al. 2011; Miller et al. 2011). TALENs are DNA-binding proteins found in bacterial plant pathogens. They carry tandem repeats of ∼34 amino acids, in which each repeat specifies binding to a single nucleotide. For genome engineering, TALEN genes that encode assemblies of ∼15 repeats coupled to the FokI nuclease domain are synthesized and used in pairs to direct cutting at specific sequences. The advantage of TALENs is that the code for specifying nucleotide binding is clearly defined (Boch et al. 2009; Moscou and Bogdanove 2009). TALENs can also produce directed alterations with very high efficiency. In some cases they are more efficient than ZFNs, but, as with ZFNs, it is not possible to know in advance how well a TALEN pair will work.

TALENs and ZFNs have been used to produce a variety of target site changes, including mutations by NHEJ, deletions, insertions of marker genes, and templated gene conversion in a number of model and nonmodel species (Carroll 2011; Beumer et al. 2013a; Gaj et al. 2013; Katsuyama et al. 2013; Lo et al. 2013; Wei et al. 2013). The ZFN and TALEN methods are still limited by the need to design and construct genes that encode the sequence-specific nucleases and the need to use them in pairs. Biologists continue to employ the earlier homologous recombination targeting techniques alongside the newer ZFN and TALEN methods.

Four recent publications, three in GENETICS (Gratz et al. 2013; Kondo and Ueda 2013; Yu et al. 2013) and one in CELL REPORTS (Bassett et al. 2013), describe a new, facile and highly efficient method of producing mutants via sequence-directed double-strand breaks in Drosophila. Six additional publications describing the application in Caenorhabditis elegans will appear in upcoming issues of GENETICS (Chiu et al. 2013; Cho et al. 2013; Katic and Großhans 2013; Lo et al. 2013; Tzur et al. 2013; Waaijers et al. 2013). A flurry of papers have reported similar results in other organisms. The promise of this technique is tremendous and is certain to have immediate and lasting effect on the way we approach genetic studies of many organisms.

The method builds on reports from last year in which the mechanistic basis of a bacterial biodefense system was elucidated (Gasiunas et al. 2012; Jinek et al. 2012). Many bacteria carry a CRISPR locus that encodes immunity against invading nucleic acids, such as viruses (Wiedenheft et al. 2012). There are different versions of this system in different species: Streptococcus pyogenes and S. thermophilus carry a type II system. CRISPR (clustered regularly interspaced palindromic repeats) encodes a series of unique and repeated sequences and is transcribed as a long RNA that is processed into shorter CRISPR RNAs (crRNAs). Each crRNA consists of a unique “spacer” region linked to a repetitive region. When the spacer region is complementary to an invading double-stranded DNA, that DNA can be cleaved. This cleavage also requires the participation of an endonuclease encoded by the Cas9 gene and a second RNA, called tracrRNA (trans-acting crRNA), which pairs via base complementarity to the repeat region of crRNAs. There is an additional restriction on which sequences the crRNA/tracrRNA/Cas9 complex can cut: the target DNA strand that matches the spacer sequence of the crRNA (not its complement) must be followed at its 3′ end by the trinucleotide sequence NGG (termed the protospacer adjacent motif, or PAM). Significantly, the functions of the crRNA and tracrRNA can be combined into a single chimeric RNA (often called an sgRNA, for single guide RNA) that functions with Cas9 to cleave dsDNA (Figure 1; Jinek et al. 2012).

Figure 1

Target site cleavage by sgRNA/Cas9. The Cas9 endonuclease is guided to the chromosomal target site by homology between the spacer region of the sgRNA (usually 20 bases) and chromosomal DNA. Opening of the DNA double helix allows base pairing between the spacer and the complementary DNA strand. Each nuclease domain of Cas9 cleaves one of the target DNA strands to produce a double-strand break.

It was immediately obvious that such a system might be repurposed for genome engineering, similar to ZFNs and TALENs. The clear appeal is that it is much simpler to design and synthesize an sgRNA than it is to design and synthesize ZFNs or TALENs. Hence the appearance of dozens of articles reporting the use of sgRNA-guided Cas9 to produce mutations in a variety of organisms and cell types in just 1 year.

To achieve cleavage in vivo two different approaches have been used: either the transfer of genes that encode Cas9 and a suitable sgRNA, or the synthesis of Cas9 mRNA and sgRNA in vitro and their transfer into the target cells. For Drosophila, it is probably easier, and perhaps more efficient, to inject the RNAs produced by in vitro transcription.

The DNA sequence encoding an sgRNA can be designed as follows (sense strand shown):

5′ GAAATTAATACGACTCACTATAGGN18GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTT 3′.

The initial 22 bp (shown in italics) is the promoter for T7 RNA polymerase used for in vitro transcription. This is followed by the sequence GGN18 (underlined), representing 20 bp that matches the target (spacer) sequence. Cleavage occurs three bases inside the 3′ end of this sequence on both strands of the target DNA. The initial GG dinucleotide is not required to achieve cleavage, but is included to optimize in vitro transcription by T7 RNA polymerase. Some variation is tolerated in these first two nucleotides, but with reduced efficiency of transcription (Imburgio et al. 2000). Different constraints may apply if a transgene is used to express the sgRNA (for instance, the U6 promoter used in many reports requires a single G). The target DNA strand that has the same sense as the sgRNA should carry the triplet NGG immediately 3′ of the 20 bp that matches the spacer. NAG is also tolerated by the S. pyogenes Cas9, but at reduced efficiency (Hsu et al. 2013; Mali et al. 2013a). Thus, the target has the sequence GGN18NGG. The remaining 80 bp (in boldface type) encodes the chimeric fusion of the crRNA and tracrRNA. Shorter sgRNA segments have worked in vitro and in vivo (Jinek et al. 2012, 2013), though they appear to be less efficient (Hsu et al. 2013). Further testing may reveal variant versions with improved efficiency (Jinek et al. 2013), but this segment already works quite well. In one case a chimeric sgRNA did not produce mutants, but dual crRNA/tracrRNA did (Lo et al. 2013).

The production of targeted mutations through the use of sgRNAs + Cas9 has been astonishingly efficient in many instances. For example, when the ms(3)K81 gene of Drosophila was targeted, 87 of 88 tested progeny carried mutant alleles (Yu et al. 2013). For most target genes, new mutations were less frequent, but heritable mutations were still often recovered at frequences >10% (Bassett et al. 2013; Yu et al. 2013). In zebrafish, most individuals that had been injected with sgRNA and Cas9 mRNA transmitted new mutant alleles of the target genes to >20%—and a few to 100%—of their offspring (Hwang et al. 2013a). Two spectacular examples come from the use of sgRNA/Cas9 in the mouse (Wang et al. 2013). When two genes were targeted simultaneously by injection of sgRNAs and Cas9 mRNA into embryos, 22/28 live-born pups carried two mutant alleles of each gene. (The term homozygote is avoided because the two mutant alleles of a single gene can arise from different events and have different sequence alterations.) When five genes were targeted simultaneously in embryonic stem (ES) cells, 10% of clones carried mutations in all eight alleles (two targets were Y- linked).

Several groups have demonstrated mutagenesis of target loci by sgRNA-directed Cas9, with high success rates in a number of different organisms including bacteria (Jiang et al. 2013), Saccharomyces cerevisiae (Dicarlo et al. 2013), C. elegans (Chiu et al. 2013; Cho et al. 2013; Friedland et al. 2013; Katic and Großhans 2013; Lo et al. 2013; Tzur et al. 2013; Waaijers et al. 2013), D. melanogaster (Bassett et al. 2013; Gratz et al. 2013; Kondo and Ueda 2013; Yu et al. 2013), zebrafish (Chang et al. 2013; Hwang et al. 2013a,b; Xiao et al. 2013), mouse (Cong et al. 2013; Shen et al. 2013; Wang et al. 2013), and human cells (Chang et al. 2013; Cho et al. 2013; Cong et al. 2013; Ding et al. 2013; Jinek et al. 2013; Mali et al. 2013b). In most studies, mutations were generated by NHEJ repair of the broken ends, which produced the typical small deletions and insertions. Experiments using homologous donor templates have achieved efficient repair by gene conversion (Chang et al. 2013; Cong et al. 2013; Gratz et al. 2013; Hwang et al. 2013a; Mali et al. 2013; Wang et al. 2013). Two simultaneous, separated cuts have also been used to generate deletions (Cong et al. 2013; Gratz et al. 2013; Mali et al. 2013b; Xiao et al. 2013). It is especially noteworthy that in almost all these reports, mutant alleles were readily detected purely by molecular screening, owing to the high mutation rates.

There is evidence that single-stranded DNA nicks are less prone to inaccurate repair than double-strand cuts, and yet they can still induce gene conversion (Kim et al. 2012). A Cas9 protein with one of its two nuclease domains mutated to be nonfunctional produces single-strand nicks in target DNA (Gasiunas et al. 2012; Jinek et al. 2012). Such Cas9 mutants were used to nick a chromosomal target site, and gene conversion events from a provided template were recovered with reduced incidence of small deletions or insertions (Cong et al. 2013; Hsu et al. 2013; Mali et al. 2013b). But not all cell types are susceptible to nick-induced gene conversion (Hsu et al. 2013).

Extensive studies of factors that influence repair of ZFN-generated DSBs provide useful guidance (for Drosophilists, at least) on how to improve the recovery of desired targeting events (Beumer et al. 2013b). For instance, inactivation of the lig4 DNA ligase strongly biases repair toward homologous recombination (Beumer et al. 2008; Bozas et al. 2009). Thus, if a template-specified change is desired, performing the procedure in a lig4 mutant background is likely to be beneficial.

Are there limitations to this system? Certainly, but they are not likely to interfere with rapid adoption of the technology. First, there is the requirement for the trinucleotide NGG adjacent to the target sequence. However, such sequences are expected to occur frequently (approximately every seven bases in Drosophila protein-coding regions). Similarly, when injecting RNAs made in vitro, T7 RNA polymerase's preference for GG as the first two bases of its transcript will slightly restrict target choices. However, there is some flexibility in the requirement for homology between the sgRNA spacer and the target DNA (see below).

The most serious concern is the potential for off-target cleavage and mutagenesis. In several articles investigators report screening for the production of mutations at sites that vary slightly from the chosen target and finding none (Bassett et al. 2013; Cho et al. 2013; Cong et al. 2013; Ding et al. 2013; Friedland et al. 2013; Wang et al. 2013). But since spacers made with deliberate mismatches to the target sequence can still effect mutagenesis, off-target effects may be significant in larger genomes (Cong et al. 2013).

There have been three in-depth reports on the mutagenic ability of spacers with single or multiple mismatched bases (Fu et al. 2013; Hsu et al. 2013; Mali et al. 2013a). In many cases, spacers with 1 to 2 base mismatches are just as efficient at mutagenesis of the target sequence as perfectly matched spacers; up to three mismatches may be tolerated and still give reasonably efficient mutagenesis. In general, the 12 bases closest to the PAM are less tolerant of mismatch than the more distal (5′) 8 bases. In human cells, off-target mutagenesis can be as frequent, or more frequent, than that of the intended target (Fu et al. 2013). Reducing the concentration of sgRNA/Cas9 has had mixed success in improving specificity (Fu et al. 2013; Hsu et al. 2013). One method to increase target specificity is to use two sgRNAs targeted to adjacent sites in concert with mutant Cas9 that is only capable of cutting one strand. Then, when both strands are cut in close proximity, it effectively produces a double-strand break (Mali et al. 2013a). Off-target nicking may still be an issue with this method.

One worry is that off-target cryptic mutations may produce phenotypes that cause misinterpretation of results. It is standard practice after chemical mutagenesis to “clean up” a new mutant by outcrossing to remove unrelated, and possibly confounding, mutations. This could also be useful after sgRNA/Cas9 mutagenesis. To prove that a mutation in a particular gene is responsible for a phenotype, a wild-type transgene may be introduced to revert the phenotype, but that may not always be feasible. Another approach for linking gene and phenotype would be to generate multiple independent mutations, including ones generated by different sgRNAs. Heteroallelic combinations can be used to decrease the likelihood of a cryptic mutation becoming homozygous. Such precautions should not be excessively burdensome in fast-growing organisms with well-established genetics, such as Drosophila and C. elegans, but may prove more difficult with other organisms. One trick that might prove useful would be to retarget the mutant allele and, using a donor template, restore the wild-type sequence. Simultaneous reversion of the gene and phenotype could be taken as evidence that the gene in question is responsible for the phenotype. For purposes of gene therapy in humans, the possibility that off-target mutagenesis may occur at significant rates is clearly an important consideration deserving of further study.

The high efficiency of sgRNA/Cas9 mutagenesis that has been reported makes it feasible to contemplate experiments that were not previously possible. Nonmodel organisms are certain to become increasingly subject to genetic analysis. Mutant mice can be generated in a single step, rather than through lengthy breeding. When one desires to add a mutation to a complex genetic background it may be more efficient to induce new mutant alleles, rather than construct the desired genotype by crossing. Targeted cleavage of repetitive sequences could be used to generate deletion collections or other chromosome rearrangements, offering balancer chromosomes for a variety of species. It remains to be seen whether sgRNA/Cas9-mediated manipulation can fully substitute for the variety and specificity of changes that can be achieved through more conventional gene targeting technology. The speed and efficiency of the sgRNA/Cas9 method should make it relatively easy to find out.

It seems natural to assume that Muller could not have imagined that we would one day have the ability to easily mutate any chosen gene, but in fact this is not true. In his Nobel lecture Muller stated:

No one can answer the question whether some special means may not be found whereby … individual genes could be changed to order.

So far, then, we have no means, or prospect of means, of inducing given mutations at will in normal material, though the production of mutations in abundance at random may be regarded as a first step along such a path, if there is to be such a path.

(http://www.nobelprize.org/nobel_prizes/medicine/laureates/1946/muller-lecture.html).

Still, it is doubtful that Muller would have foreseen that the path would involve the extensive use of foreign and synthetic genes and functions in such a wide array of species.

Acknowledgments

I thank Dana Carroll and Mark Johnston for suggestions and criticisms. Work in the author's laboratory is supported by National Institutes of Health grant GM065604.

Note added in proof: Three articles describing the use of sgRNA/Cas9 in plant cells (Li et al. 2013; Nekrasov et al. 2013; and Xie; Yang 2013), and two more articles examining off-target mutagenesis (Cradick et al. 2013; Pattanayak et al. 2013) appeared while this article was in press.

Footnotes

  • Communicating editor: M. Johnston

Literature Cited