## Abstract

CRISPR/Cas9 gene drive (CGD) promises to be a highly adaptable approach for spreading genetically engineered alleles throughout a species, even if those alleles impair reproductive success. CGD has been shown to be effective in laboratory crosses of insects, yet it remains unclear to what extent potential resistance mechanisms will affect the dynamics of this process in large natural populations. Here we develop a comprehensive population genetic framework for modeling CGD dynamics, which incorporates potential resistance mechanisms as well as random genetic drift. Using this framework, we calculate the probability that resistance against CGD evolves from standing genetic variation, *de novo* mutation of wild-type alleles, or cleavage repair by nonhomologous end joining (NHEJ)—a likely by-product of CGD itself. We show that resistance to standard CGD approaches should evolve almost inevitably in most natural populations, unless repair of CGD-induced cleavage via NHEJ can be effectively suppressed, or resistance costs are on par with those of the driver. The key factor determining the probability that resistance evolves is the overall rate at which resistance alleles arise at the population level by mutation or NHEJ. By contrast, the conversion efficiency of the driver, its fitness cost, and its introduction frequency have only minor impact. Our results shed light on strategies that could facilitate the engineering of drivers with lower resistance potential, and motivate the possibility to embrace resistance as a possible mechanism for controlling a CGD approach. This study highlights the need for careful modeling of the population dynamics of CGD prior to the actual release of a driver construct into the wild.

THE prospect of driving genetically modified alleles to fixation in a population has enticed scientists for >40 years (Curtis 1968; Foster *et al.* 1972). Potential applications are broad and ambitious, including the eradication of vector-borne diseases such as malaria, dengue, and Zika (Burt 2003; Esvelt *et al.* 2014; Champer *et al.* 2016). For example, mosquitoes could be genetically altered such that they can no longer transmit *Plasmodium* parasites. If these altered alleles could then be spread in a wild population, we could effectively eradicate malaria in humans. Similarly, a gene drive could be used to reverse insecticide resistance in an agricultural pest, or to spread a deleterious allele in an invasive species to drive it toward extinction.

While various strategies for implementing a gene drive have been discussed since the 1970s, all previously proposed mechanisms have faced significant obstacles. Ongoing efforts to transfer the *Medea* system from flour beetles to other insects have, thus far, fallen short (Chen *et al.* 2007; Akbari *et al.* 2014). Underdominance systems have also been developed (Altrock *et al.* 2010; Akbari *et al.* 2013; Reeves *et al.* 2014), but these are slow-spreading systems that require multiple releases of large numbers of male transgenic insects. The use of *Wolbachia*-mediated strategies have perhaps been most successful (Lambrechts *et al.* 2015).

The recently proposed CRISPR/Cas9 gene drive (CGD) system now promises a highly adaptable approach for driving genetically engineered alleles into a population, even when those alleles carry substantial fitness costs (Gantz and Bier 2015; Gantz *et al.* 2015). CGD falls into the category of homing endonuclease gene drives, first proposed by Burt (2003), which consists of a genetic construct (driver allele) encoding an endonuclease that can cleave a genomic target site and then insert itself into that site. CGD uses a Cas9 endonuclease for cleavage that can be engineered to target virtually any 20-bp-long nucleotide sequence in a genome, using a target-specific guide RNA (gRNA). The driver construct is flanked by two homology arms to facilitate its incorporation at the cleavage site during homology-directed repair. In addition, arbitrary payload sequences can be included in the construct that will be transmitted alongside the driver. By converting heterozygotes for the driver allele into homozygotes (Figure 1A), a CGD construct can “cheat” the rules of Mendelian inheritance, allowing for rapid spread of the driver in the population (Burt 2003; Unckless *et al.* 2015).

As a proof of principle, Gantz and Bier (2015) showed that CGD was highly effective at converting heterozygotes for a driver allele into homozygotes in a laboratory cross of *Drosophila melanogaster*. In their experiment, insects homozygous for a CGD construct that disrupts the *X*-linked *yellow* gene were crossed to wild-type insects, resulting in 254 of 261 daughters with the disrupted yellow phenotype (Gantz and Bier 2015). Since this phenotype is recessive, CGD must have disrupted the wild-type allele in these females. In another study, a similar construct conferring resistance to *Plasmodium falciparum* was successfully introduced into a laboratory population of *Anopheles* mosquitoes (Gantz *et al.* 2015). A third study recently generated a CGD targeting a mosquito fertility gene for population suppression (Hammond *et al.* 2016). Several theoretical studies have further shown that gene drives can, in principle, spread very rapidly in a population, even when the driver allele carries a substantial fitness cost (Burt 2003; Sinkins and Gould 2006; Deredec *et al.* 2008; Gould 2008; North *et al.* 2013; Burt 2014; Unckless *et al.* 2015; Drury *et al.* 2016).

Despite these initial results, it is still unclear whether CGD can actually lead to fixation of the driver allele in a large natural population because of several factors that might influence its population dynamics. Arguably the most critical problem is the evolution of resistance against the drive (Burt 2003; Deredec *et al.* 2008, 2011; Esvelt *et al.* 2014; Bull 2016; Noble *et al.* 2016b). Such resistance could arise, for instance, through alleles at the target locus that cannot be converted by CGD because they are not recognized by the driver’s gRNA. In fact, a single nucleotide mutation in the 3-bp-long PAM motif of the target sequence will typically suffice to create such a resistance allele. Most indels within the target will likely create a resistance allele as well. Importantly, the driver itself is expected to produce its own resistance alleles when cleavage is repaired by end joining mechanisms such as nonhomologous end joining (NHEJ) or microhomology-mediated end joining, rather than homologous recombination, and such resistance alleles have already been observed in several experiments (Gantz and Bier 2015; Gantz *et al.* 2015; Hammond *et al.* 2016; Wang *et al.* 2016).

If resistance alleles are less deleterious than the driver allele—as will likely be the case for many types of resistance alleles—natural selection will favor these alleles over the driver. As a result, they will rise in frequency and should eventually replace the driver. Thus, even though a driver may initially spread to high frequency in the population, its ultimate fate will depend on whether resistance alleles have emerged during this process. While several strategies have been proposed for reducing resistance potential; including the use of multiple gRNAs, the targeting of essential genes, daisy chains, or poison-antidote systems (Esvelt *et al.* 2014; Champer *et al.* 2016; Noble *et al.* 2016a); it remains to be seen how well these approaches can actually work in practice.

It is clear that any informed decision about the potential consequences and risks of releasing a CGD construct into a wild population requires that we first understand the dynamics of this process on the population level, including potential resistance mechanisms. Here, we build on previous theoretical results to devise a comprehensive population genetic model of CGD, which allows us to quantitatively study the population dynamics of such systems, calculate the probability that resistance evolves, and predict how resistance alleles will interfere with the spread of a driver construct.

## Results

### Population genetic model of CGD

Our model for studying the population dynamics of CGD is based on the original drive system proposed by Gantz and Bier (2015), which hearkens back to Burt (2003). We consider a single locus with three types of alleles: wild type (0); driver (d); and resistance (r), where we define a resistance allele to be any allele that cannot drive and also cannot be converted into a functioning driver by the CGD machinery (complete resistance). The driver cleaves the wild-type allele in heterozygous driver/wild-type embryos at rate In a fraction of cases, cleavage repair creates a resistance allele when NHEJ introduces an indel at the target site. In reality, such resistance alleles can be created by any repair mechanism that changes the target sequence without incorporating the functional drive machinery (*e.g.*, microhomology-mediated end joining or the incorporation of a loss-of-function driver construct) (Champer *et al.* 2016), but we refer to this entire class as NHEJ for simplicity. Resistance alleles can also arise by *de novo* mutation in wild-type alleles at rate specifying an effective rate that includes all possible mutations creating resistance. We set the fitness of wild-type homozygotes to whereas all other genotypes can carry arbitrary fitness cost, The driver is initially introduced into the population in generation at frequency We model a large, panmictic population with variance effective population size and census population size

Note that there could generally also exist other types of alleles at the locus. For example, resistance alleles that arise by *de novo* mutation could be distinguishable from those that arise by NHEJ, or resistance alleles that were already present as natural variation in the population. Depending on the specific drive scenario, these different resistance alleles could carry different costs. For simplicity, we combine all resistance alleles into a single class with the same fitness costs in our model.

We will typically perform our numerical analyses using a “standard” model, for which we vary individual parameters independently, while keeping the other parameters constant. This will allow us to dissect how each individual parameter affects the general behavior of the model. Our standard model assumes a scenario in which cleavage is efficient, as one would likely aim to design a driver this way. We further assume that resistance alleles carry no cost, which we will relax later. Driver alleles, by contrast, carry a substantial cost that is codominant, In those analyses where both the driver and the resistance allele carry a cost, we will assume that these costs are codominant to each other, We further set and Both are likely conservative choices with respect to the actual rates at which resistance alleles arise in many systems of interest, such as insects. In fact, NHEJ rates are probably several orders of magnitude higher in most systems, in which case the evolution of resistance from NHEJ will be practically assured (as we will show below). We still choose a very low value of for our standard model both to be able to study systems in which NHEJ has been intentionally suppressed and because it is the parameter space where resistance is not inevitable. We finally set for simplicity, and introduce driver alleles into the population at frequency Table 1 provides a summary of the different parameters and their default values in our standard model.

### Deterministic frequency dynamics

In our model, we can express the expectation values of allele frequencies and as functions of their frequencies in the previous generation; the fitness costs of the different genotypes; and the rates and We initially assume that and such that the generation of new resistance alleles does not noticeably affect wild-type and driver allele frequencies (we will relax this assumption below). Given allele frequencies and in generation the expected values of these frequencies in generation are then(1)(2)(3)The contributions to the numerator in Equation 1 are as follows: The first term specifies the contributions of driver/wild-type heterozygotes that either successfully converted into driver homozygotes and thus now have fitness [fraction of driver/wild-type heterozygotes], or failed to convert and thus remain heterozygotes with fitness [fraction of driver/wild-type heterozygotes]. The second and third terms specify the contributions from driver/resistance heterozygotes and driver homozygotes respectively. In Equation 2, the first term in the numerator specifies the contribution from resistance/wild-type heterozygotes, the second term the contributions from resistance/driver heterozygotes, and the third term the contribution from resistance homozygotes. Equation 3 follows from the condition that allele frequencies have to sum to one. The mean population fitness in generation is given by(4)These equations fully describe the dynamics of allele frequencies from a given set of starting frequencies in a deterministic scenario, which will be appropriate when allele frequencies are sufficiently large that genetic drift can be ignored. Additionally, these equations can be incorporated into a Wright–Fisher framework that explicitly takes drift into account.

We and others have previously shown that these dynamics can already produce rich behavior in the absence of resistance (Burt 2003; Deredec *et al.* 2008; Unckless *et al.* 2015). Possible outcomes can include the fixation of the driver, loss of the driver, and both stable or unstable equilibria, depending on the cleavage rate, the fitness costs of the driver allele, and and the initial frequency at which the driver is introduced into the population. Briefly, in scenarios with high cleavage rate and small fitness costs, the driver will usually fix even when introduced at low frequency, whereas for low cleavage rates and high fitness costs the driver may not be able to invade at all. Intermediate scenarios can give rise to equilibria that are usually unstable, but can be stable if cleavage rates are small and fitness costs are low and recessive. For the remainder of this study, we will focus only on cases where, in the absence of resistance, the driver allele can go to fixation.

### Origin of resistance alleles

We distinguish three mechanisms by which resistance alleles can originate: (i) standing genetic variation (SGV), (ii) *de novo* mutation of wild-type alleles after the driver is introduced (specified by *μ*), and (iii) cleavage repair by NHEJ (specified by *δ*). Initially we consider each of these mechanisms independently, assuming the others do not occur. To quantify the expected supply of resistance alleles from SGV, we assume mutation-selection-drift balance prior to the introduction of the driver, which is determined by the fitness costs of resistance alleles in the absence of the driver and the population-level mutation rate toward resistance alleles, (Hermisson and Pennings 2005). Once the driver is introduced, *de novo* mutation of wild-type alleles will create resistance alleles at rate(5)This rate is proportional to the number of wild-type alleles and thus decreases as the driver spreads. NHEJ during break repair will create resistance alleles at rate(6)which is proportional to the number of wild-type/driver heterozygotes and thus is highest when the driver is at intermediate frequency.

### Establishment of resistance alleles

The overall likelihood that resistance evolves will depend on the supply of resistance alleles by the individual mechanisms, and the probability that a new resistance allele is not lost to drift when it is still rare (its so-called establishment probability). Before we discuss the individual contributions of resistance from SGV, *de novo* mutation, and NHEJ, we will first calculate these establishment probabilities in our model.

As long as resistance alleles are still rare, they should be present mostly in heterozygotes, and we can therefore neglect resistance homozygotes at this stage. If a resistance allele is present at frequency in generation its expected frequency in the next generation will be(7)which can be rearranged into(8)This “effective” selection coefficient specifies the expected change in frequency of the resistance allele in generation Note that is time dependent because it is a function of the driver frequency (Figure 1B). If resistance alleles carry a cost themselves, then could initially be negative and turn positive only after the driver has reached a certain frequency in the population.

To calculate in a given generation we need to know the frequency of the driver allele in that generation. Since we assume that resistance alleles are still at very low frequency, such that the dynamics from Equations 1–4 simplifies to(9)Assuming a deterministic model for the driver frequency, we can then calculate recursively from its starting frequency

We define establishment probability as the probability that a resistance allele present in a single copy in generation successfully survives in the population (*i.e.*, is not lost by drift while it is still rare). Uecker and Hermisson (2011) showed how the establishment probability of a new mutation can be calculated when its selection coefficient is time dependent. Their theory can be directly applied to our scenario if we consider a resistance allele to be a new mutation with time-dependent selection coefficient Defining and we obtain (Appendix A)(10)Both of these expressions break down when which could be the case if the fitness cost of the driver is completely dominant, such that resistance alleles would not provide any fitness advantage in driver/resistance heterozygotes. We will discuss this particular scenario below. For now, we assume that Note that as the driver spreads, the supply rates of resistance alleles from *de novo* mutation and NHEJ, the selective advantage of being resistant, and the probability that resistance escapes stochastic loss are all dynamic quantities (Figure 1B).

Given an estimate for we can directly calculate the probability that a resistance allele will successfully establish in the population when it is initially present in copies at time (11)Figure 1C shows how scales for our standard model with the initial copy number of resistance alleles present in the SGV at the time the driver is introduced.

### Resistance from the standing genetic variation

The probability that resistance evolves from alleles present as SGV will depend on the expected frequencies of such alleles in the population in generation and their establishment probabilities provided by Equation 11. We can use existing theory for the well-studied problem of evolutionary rescue from SGV (Hermisson and Pennings 2005; Orr and Unckless 2008) for calculating this probability. In particular, we will use the analytical framework of Hermisson and Pennings (2005) and consider the introduction of the driver to be the environmental shift, which then renders a previously neutral or deleterious resistance allele as beneficial as mean population fitness declines when the driver spreads. We assume that prior to introduction of the driver, resistance alleles arise at rate per generation per haploid wild-type genome, and that they are evolving under mutation-selection-drift balance, specified by fitness costs and We define to be twice the population-level mutation rate toward resistance alleles. After introduction of the driver, the fitness effects of resistance alleles are given by as defined in Equation 8.

Let denote the probability that resistance successfully evolves from any resistance allele present in the SGV at the time the driver is introduced, assuming that SGV is the only possible source of resistance alleles (which we can assure in our model by setting *μ* and *δ* to zero once the driver is introduced). For this case, we obtain(12)The derivation follows Hermisson and Pennings (2005) and involves integrating establishment probabilities over the expected frequency distribution of resistance mutations under mutation-selection-drift balance. Our result for the codominant case follows directly from Equation 8 in Hermisson and Pennings (2005), where we can identify (the heterozygous fitness cost of resistance alleles) with in our model. The factor in this equation is the establishment probability of a mutation present in a single copy in generation zero, which corresponds to in our model. The result for recessive resistance costs follows from the discussion provided after equation A11 in Hermisson and Pennings (2005), where they show that the factor in their equation 8 needs to be replaced by in the recessive case. The factor in this expression specifies the homozygous fitness cost of the mutation, which corresponds to in our model.

Figure 2 shows as a function of in our standard model, while simultaneously varying the cleavage rate (Figure 2A), the fitness cost of the driver allele (Figure 2B), or the fitness cost of resistance (Figure 2, C and D). Interestingly, as the cleavage rate increases, so does the probability of resistance from SGV. This is because higher leads to a faster spread of the driver, and resistance alleles present in the SGV will thus experience their net fitness advantage sooner; increasing their chances of surviving drift. Generally, as long as resistance alleles provide a net fitness advantage over the driver, evolution of resistance from the SGV is practically assured whenever whereas it remains unlikely for This is consistent with the general result that the probability of adaptation to an environmental shift using alleles from the SGV should depend only weakly on the selective advantage of these alleles in the new environment, but should be mostly determined by how many such alleles were present when the environment changed, which is a function of and the selective disadvantage of these alleles prior to the change (Hermisson and Pennings 2005).

If resistance alleles carry a fitness cost themselves, this lowers the probability that resistance evolves from the SGV. However, this effect is much more pronounced when fitness costs are codominant than when they are recessive, consistent with the fact that even a deleterious allele can still reach a noticeable frequency in the SGV as long as its costs are only recessive. In our model, even for substantial recessive costs we still find to be practically indistinguishable from the cost-free scenario.

### Resistance from de novo mutation

Any given resistance allele that arises from *de novo* mutation as a single copy in generation will successfully establish with probability assuming that resistance does not arise from any other mechanism. Thus, the overall probability that at least one allele that arose by *de novo* mutation during the time interval [1,*t*] establishes in the population is given by(13)The factor here specifies the probability that no resistance allele from generation successfully establishes in the population.

Note that is defined in terms of the census population size because the overall supply of resistance mutations depends on how many wild-type genomes are present in the population overall, not the effective population size. If the effective population size differs from the census size, this will not affect but will change the establishment probability of a new resistance allele initially present in just a single copy in the population, specified by Equation A1. However, this change is often well approximated by where technically refers to the so-called “fixation” effective population size (Otto and Whitlock 1997). In that case, simply cancels out in the products We always set in our standard model for simplicity.

Resistance alleles from a *de novo* mutation can only arise as long as the driver has not yet fixed in the population. Afterward there will no longer be wild-type alleles present that could mutate into resistance alleles: thus, The overall probability that resistance establishes from a *de novo* mutation arising after introduction of the driver is therefore(14)Figure 3, A–C, shows how depends on the parameters in our standard model. Again, we find that the key parameter determining the likelihood of resistance is However, in contrast to resistance from SGV, both the cleavage rate of the drive and the cost of the driver allele can noticeably affect as well. Specifically, higher cleavage rates decrease (the opposite effect they had on *P*_{SGV}). This is because higher cleavage rates speed up the spread of the driver, providing less time for resistance alleles to emerge by *de novo* mutation before the driver becomes fixed. Higher fitness costs of the driver, on the other hand, increase by increasing the establishment probability of resistance alleles, as resistance alleles will have a higher net fitness advantage over the driver. When resistance alleles carry (codominant) fitness costs themselves, this has only a marginal effect on as long as they still provide a net fitness advantage. Since we assume that resistance homozygotes are irrelevant for the establishment probability of a new resistance allele, recessive costs have no noticeable effect on in our model.

When comparing with for the same set of model parameters, we find that is generally higher than Hence, resistance is more likely to evolve from resistance alleles already present when the driver is introduced, than from *de novo* mutation after its introduction. The only exception would be a scenario of a very inefficient drive (small *c*), in which case it would take a long time until the driver becomes prevalent in the population, reducing while at the same time providing more time for *de novo* resistance alleles to emerge.

### Resistance from NHEJ

Similar to *de novo* mutation, any given resistance allele that arises in generation by NHEJ will successfully establish with probability assuming that resistance does not arise from any other mechanism. The overall probability that at least one such allele that arose during the time interval [1,*t*] establishes in the population is then given by(15)This equation is analogous to Equation 13 for the *de novo* mutation scenario, except for being exchanged by Given that will be zero once the driver has fixed, we can again use this result to calculate the overall probability that at least one resistance allele from this mechanism establishes in the population:(16)Figure 3, D–F, shows how depends on the parameters in our standard model. To ensure that there is no resistance from the SGV or *de novo* mutation, we set We then varied by varying while keeping constant. As we discussed above, this is likely to be a very low estimate for the rate at which cleavage repair generates resistance alleles through NHEJ in many systems, which may be difficult to achieve in practice. Higher rates will always increase making this estimate conservative with regard to the probability that resistance evolves by this mechanism.

Given the similarity between the NHEJ and *de novo* mutation scenarios, it is not very surprising that the key parameter determining the likelihood of resistance in the NHEJ scenario is the product similar to the parameter in the *de novo* scenario. Whenever becomes on the order of one or larger in our model, evolution of resistance from NHEJ becomes likely.

In contrast to the *de novo* mutation scenario, the cleavage rate has very little impact on This can be understood from the fact that the overall number of resistance alleles that arise by NHEJ before driver fixation is proportional to the overall number of cleavage events that occur, which is independent of the actual cleavage rate (as long as selection against the driver does not yet have a strong effect on its spread). Successful fixation, on average, always requires conversion events throughout this process, regardless of the actual conversion rate. Fitness costs of the driver, however, can have a strong impact on with higher fitness costs leading to lower by reducing — the same effect they had in the *de novo* mutation scenario.

### Total resistance probability

Given the individual probabilities and we can calculate the total probability that resistance evolves by any of the three mechanisms(17)Figure 4 shows for a wide range of parameters and limiting cases in our standard model. Consistent with our results for the individual resistance mechanisms, we find that the total resistance probability depends primarily on three parameters: and Specifically, whenever resistance is likely to evolve from the SGV, as long as resistance alleles provide a net fitness advantage over the driver. This result is largely independent of the conversion rate of the driver, as well as the absolute fitness cost of the driver. Resistance alleles that arise from *de novo* mutation after introduction of the driver are generally less likely to contribute to resistance than alleles from the SGV. NHEJ produces resistance whenever unless fitness costs of the driver are very small.

### Relaxing model assumptions

#### Recessive and dominant driver costs:

So far, we have assumed that driver fitness costs are codominant. If they are recessive, the driver can initially rise in frequency faster, as driver heterozyogtes carry no costs. Conversely, if they are dominant, this will slow down the initial spread of the driver. In our standard model, the differences in driver frequency trajectories between recessive, codominant, and dominant driver fitness costs are only marginal (Figure 5A). This is because driver dynamics are dominated by conversion, rather than selection, which only becomes important once driver fitness costs are of the same order as the cleavage rate

Dominance of the driver can nevertheless have a strong impact on the probability that resistance evolves, due to its effects on the fitness of driver/resistance heterozygotes. Resistance alleles that are still rare will predominantly be present in driver/resistance heterozygotes once a driver is frequent. If driver costs are recessive, will then be larger than in the codominant case, increasing and thus and By contrast, if driver costs are dominant, driver/resistance heterozygotes will have no fitness advantage over driver homozygotes. Once wild-type alleles have been completely displaced by the driver, only resistance homozygotes would then have a fitness advantage. Unfortunately our approach for calculating establishment probabilities based on Equation 10 fails here because it does not take resistance homozygotes into account.

We can still estimate whether resistance will likely evolve in this case by mapping the problem onto the analogous problem of whether a beneficial mutation can establish in a population when its fitness effects are completely recessive. Such a recessive beneficial mutation with fitness in homozygotes and fitness in heterozygotes has an establishment probability of approximately when initially present in a single copy (Kimura 1962), which we can extend to the probability that the mutation will successfully establish when it is initially present at frequency (18)In our standard scenario with cost-free resistance, the net fitness advantage of resistance homozygotes in this case will be assuming that the driver is completely dominant and close to fixation. Figure 5B shows that for resistance to evolve in this scenario, resistance alleles need to be present at substantial initial frequency For example, when the driver has a dominant fitness cost of resistance alleles would have to be present at ∼ frequency to have a chance of successfully establishing. Note that this would likely be the case in scenarios where is no longer very small, as we will show next in our treatment of the frequent NHEJ regime.

#### Frequent NHEJ regime:

Our dynamical model described by Equations 1–4 assumes that is small enough that we can effectively neglect the contribution of NHEJ events to changes in allele frequencies between generations. This may no longer be the case if NHEJ becomes more frequent, say Extending our model to such a scenario is straightforward (Appendix B). Evolution of resistance is practically ensured in this regime as long as resistance alleles carry any net fitness advantage relative to driver alleles, given that there will always be ample supply of resistance alleles. As long as allele frequency trajectories are still dominated by conversion, rather than selection, the ratio of resistance allele frequency over driver allele frequency should then simply be(19)This is because every time CGD-induced cleavage in a driver/wild-type heterozygote is repaired, in a fraction of cases a resistance allele will be added to the population, whereas in a faction of cases a driver allele will be added. Fitness differences between driver and resistance alleles will change their relative frequencies over time, but this can be slow, unless these fitness differences become large. For a driver with high conversion rate and moderate fitness costs, we would expect that at the time the wild-type allele is lost from the population, the population frequency of resistance alleles should still be close to the ratio given in Equation 19.

#### High introduction frequency:

Our simulations show that varying the introduction frequency of the driver has almost no noticeable effect on the total probability that resistance evolves, which is due to two competing effects: Increasing the introduction frequency will increase and thus because resistance alleles present in the SGV will experience their fitness advantage faster. At the same time, this will also lead to faster fixation of the driver, leaving less time for *de novo* resistance mutations to occur, which will decrease The two effects approximately cancel out. will generally not depend much on the introduction frequency, as long as because most resistance alleles produced by NHEJ arise when the driver is at intermediate frequency. Note that changing can still have a noticeable impact on the fixation time of the driver. For example, while it takes the driver generations to reach frequency 0.5 in our standard model with this only takes generations when initially introduced at frequency

#### Varying population size:

We have so far assumed that population size remains constant over time. This assumption is likely violated in many systems. Insect populations, for instance, can show dramatic fluctuations in population size over just a few generations (Berryman 2002). In temperate areas, winters can result in population crashes, followed by rapid increase during the growing season. Management practices for pests, such as insecticide application, can also change population sizes dramatically, as could CGD itself if the goal is to spread a harmful allele in the population.

Relaxing the assumption of constant population size can affect three aspects of our analytical framework: (i) the supply of resistance alleles in the SGV, (ii) the resistance allele creation rates and and (iii) the establishment probability of a resistance allele. The framework by Uecker and Hermisson (2011) we adopted for calculating does in fact allow for arbitrary changes in population size over time. Given functions and Equation A1 then becomes(20)which can be solved numerically for any given demography in the same way we did for the constant population size scenario. Both and are already functions of in our model. Under varying population size, these rates become:(21)(22)Given and rates and the resistance probabilities and can then be calculated according to Equations 14 and 16 the same way we did for the constant population size scenario.

To our knowledge, there are no general analytical solutions for mutation-selection-drift balance under arbitrary demography scenarios. Calculation of will therefore typically rely on numerical simulations to infer the expected frequency of resistance alleles in the SGV. However, such distributions could be easily obtained from Wright–Fisher simulations under any given demographic model, allowing for numerical estimation of

### Data availability

Our theory for modeling CGD dynamics and calculating resistance probabilities is implemented in a C++ command line program. This program takes as input the parameter values of the given model: For the given parameters, it then calculates the driver allele frequency trajectory under a deterministic model, specified by Equations 1–4 in the absence of resistance; the rate at which resistance alleles are expected to arise by *de novo* mutation; the rate at which they are expected to arise by NHEJ; the effective selection coefficient of resistance alleles; and the establishment probability of a resistance allele arising in a single copy in generation Results are provided for each generation The program also calculates the individual resistance probabilities and Executables, source code, and documentation for this program are available at http://messerlab.org/resources/.

## Discussion

The prospect of using CGD for genetic manipulation of entire populations has sparked both optimism and deep concern. While this technology could help us tackle some of the most pressing public health challenges, such as vector-borne diseases, there are also serious doubts regarding our ability to control a gene drive once released into a wild population. In this study, we showed that resistance will almost inevitably evolve against standard CGD approaches, most likely due to resistance alleles produced by the driver itself when cleavage is repaired by NHEJ-type mechanisms.

We focused in our analyses on population parameters we feel are appropriate for many proposed target systems of CGD, such as insect populations. Estimates of single nucleotide mutation rates tend to be in the range of in such organisms (Haag-Liautard *et al.* 2007; Keightley *et al.* 2009, 2014), and a single mutation in the PAM motif probably already suffices to create a resistance allele against CGD (Cong *et al.* 2013). In addition, indels should typically occur within the 20-bp-long target sequence of a CGD construct at rates comparable to single nucleotide mutation rates (Schrider *et al.* 2013). In our model, we showed that resistance from SGV and *de novo* mutation is then likely whenever compatible with estimates of effective population sizes in many insect populations (Leffler *et al.* 2012). In fact, the values of relevant for CGD in natural insect populations may often be much larger than these values of long-term obtained from levels of neutral diversity, which tend to be dominated by historical bottlenecks and seasonal population crashes (*e.g.*, caused by the winters in a temperate region) (Hartl and Clark 1997; Berryman 2002; Karasov *et al.* 2010; Messer and Petrov 2013). During warm seasons, insect population sizes can easily reach billions and more, and resistance alleles will inevitably arise during these times when the supply of resistance alleles is not limited by mutation. If enough generations occur before the next population collapse, resistance alleles could reach a sufficiently high frequency to persist throughout the collapse (Wilson *et al.* 2014). In such scenarios, resistance probabilities may then actually depend on the specific time during the year a driver is released.

Yet even in much smaller populations, resistance is still likely to evolve from the repair of driver-induced cleavage events by NHEJ, unless the NHEJ rate can be limited to below This poses a very low boundary for many natural populations, which should be quite difficult to achieve in practice, given that NHEJ rates observed in previous CGD experiments were typically on the order of 1% or larger (Gantz *et al.* 2015). In conclusion, our results suggest that resistance should evolve almost inevitably against standard CGD approaches in most natural populations, unless NHEJ can be effectively suppressed, fitness costs of the driver are completely dominant, or these fitness costs are on par with those of resistance alleles.

Our modeling framework also allows us to study and compare resistance probabilities under specific CGD strategies (Esvelt *et al.* 2014). One proposed strategy is to disrupt an existing gene (*e.g.*, a gene involved in insecticide resistance). In this case, cleavage repair by NHEJ could still achieve the intended result—a disrupted copy of the gene. Since these alleles should carry similar fitness costs as the driver, they are not likely to rise in frequency. However, resistance may still evolve if some mutations change the target sequence without actually disrupting the gene, or if the ability to drive itself is already associated with some fitness costs, as will likely be the case if off-target effects of the driver have not been completely suppressed (Fu *et al.* 2013; Anderson *et al.* 2015; Zhang *et al.* 2015). Another proposed strategy is the insertion of a new gene (*e.g.*, a gene that prevents mosquitoes from transmitting malaria). If the target site lies in a nonfunctional region, NHEJ could then create cost-free resistance alleles, rendering the evolution of resistance highly probable if the driver carries a fitness cost.

Several CGD strategies have already been proposed specifically to combat resistance (Burt 2003; Deredec *et al.* 2008, 2011; Esvelt *et al.* 2014; Noble *et al.* 2016a,b). One such strategy is to include multiple gRNAs in the construct to target different sites. The probability that resistance evolves should then remain very low in our model, due to the large number of mutations required for an allele that is resistant to all gRNAs. Yet in practice such a strategy will likely add complexities that require more sophisticated modeling, such as allele classes with intermediate levels of resistance and possibly intermediate fitness costs. Resistance could then still evolve via a succession of intermediate steps. Another approach, often proposed in concert with multiple gRNAs, is the targeting of a conserved region of an essential gene (Esvelt *et al.* 2014). The idea being that individuals carrying NHEJ alleles will then have very low fitness—even as heterozygotes—and therefore will be quickly purged from the population. However, this strategy may not be appropriate in all cases. For example, if the goal is to knock out an insecticide-resistance gene, then the target gene will be predetermined. Furthermore, NHEJ may still occasionally repair sequences in such a way that renders them resistant to further drive, but without introducing significant fitness consequence. Our analytical framework can accommodate such an approach if we create two categories of NHEJ alleles: lethal and viable. The lethal NHEJ alleles can simply be ignored, while the viable NHEJ alleles can be modeled the same way as in our original model, but arising at a lower effective rate. Note that even if the proportion of such viable NHEJ among all NHEJ alleles is only 0.1–1%, as long as overall NHEJ rates are high enough, resistance will still likely evolve in our model.

All of these approaches are based on the paradigm that resistance is always an impediment for practical CGD applications, which needs to be averted using more elaborate CGD approaches. Yet maybe this is not always true. In fact, there may be scenarios in which the ensured evolution of resistance could actually provide a valuable safety mechanism for controlling a driver, while still allowing the approach to effectively achieve its goal. This is because even when resistance alleles will ultimately replace the driver allele, this replacement will be driven by natural selection, which can be much slower than the initial rise of the driver. In our standard model, for instance, it takes >100 generations until resistance alleles reach 50% frequency even when they were already present when the driver was introduced, whereas the driver reaches 99% frequency in <30 generations (Figure 5C). This lag time depends on the net selective advantage of the resistance allele over the driver and will be even longer for lower driver costs (Figure 5D). Short-term population transformations can thus still be attained. Subsequent CGD constructs could be designed and released that specifically target resistance alleles, based on regular surveys of genetic variation at the target site—a process that may be more effective and safe than the use of the CGD system with multiple gRNAs (Esvelt *et al.* 2014). Our model provides a quantitative framework for predicting the population dynamics of driver and resistance alleles in such scenarios and can shed light on how drive parameters need to be tuned to orchestrate key features of these dynamics, such as the initial rate of spread of the driver, the maximum frequency it can attain in the population, and the time it will take until resistance alleles will replace the driver. Note that our theory also allows the assessment of strategies involving the intentional release of resistance alleles for controlling a CGD approach.

It is clear that there is a need for more detailed modeling of CGD on several fronts. For example, we assumed that resistance is a binary trait, yet resistance levels could depend on number, type, and location of mutations in the target site (Cong *et al.* 2013). Expression of the driver in the germline, instead of embryos, could alleviate some of the recessive fitness costs of the driver in converted individuals (Deredec *et al.* 2008). Furthermore, resistance alleles may be created if homology-directed repair inserts the driver construct, but introduces errors that prevent it from driving. We also limited our study to resistance at the target site, even though *trans*-resistance might be common. Target species may harbor natural variation for Cas9 expression levels or may produce peptides or RNA that silences the CRISPR machinery (Bondy-Denomy *et al.* 2013). The possibility of all these other resistance mechanisms suggests that our estimates for the probability that resistance evolves are likely to be conservative. Finally, we have not addressed CGD strategies in which an allele is spread with the intention to actively diminish a population. While our model does provide a framework for analyzing such strategies when provided with an explicit model for how the spread of the suppression allele will affect population size, we still know very little about how an actual population would react to such an approach.

Throughout this analysis we remained purposefully agnostic to the potential benefits and risks of this potent biological technology. We do, however, acknowledge the need for extensive discussion among scientists, policy makers, and the public before any release of a CGD into a wild population is ever considered (Esvelt *et al.* 2014; Oye *et al.* 2014; Akbari *et al.* 2015). It is our hope that this work facilitates such discussion.

## Acknowledgments

We thank Anthony Long for suggesting the nonhomologous end joining resistance concern and Jackson Champer, Brittny Smith, and members of the Messer and Clark laboratories for helpful discussions. This work was supported by the National Institutes of Health grants R01 GM-064590 to A.G.C. and K99 GM-114714 to R.L.U, and funds from the College of Agriculture and Life Sciences at Cornell University to P.W.M.

## Appendix A: Calculation of Establishment Probabilities

Uecker and Hermisson (2011) developed a framework for calculating the establishment probability of a new mutation when its selection coefficient is time dependent. Their theory can be directly applied to our scenario if we consider a resistance allele to be a new mutation with time-dependent selection coefficient According to equation 16 in Uecker and Hermisson (2011), the establishment probability of such a resistance allele when initially present in a single copy at time is then given by

To calculate the improper integral in the denominator of Equation A1 for we can split it into two components, prior to and after fixation of the driver allele(A2)The integral in the exponent of the second summand still extends back to time but we can again partition it into a component prior to and afterward, and then make use of the fact that will no longer depend on in the latter component. This yields(A3)For we obtain(A4)Both of these integrals break down when which could be the case if the fitness cost of the driver is completely dominant and resistance alleles would therefore not provide any fitness advantage in driver/resistance heterozygotes. Here we assume that

Importantly, all the above integrals are defined over continuous time, whereas in our Wright–Fisher-type model, generations are discrete and values for are only defined for Before we can estimate establishment probabilities in our discrete model, we first have to map these integrals onto sums over discrete generations. In the following, we will always use to denote discrete variables (generations), whereas will denote continuous variables.

For mapping onto continuous time, we will extend it to a piecewise constant function with for We can then associate the integrals inside the exponents in Equation A2, estimated between two discrete generations *t* < *t* with sums of the form(A5)Discretization of the outer integral in Equation A2 can be achieved by partitioning it into the individual integrals between subsequent generations, yielding(A6)Both and are now successfully expressed in terms of only the values of estimated in discrete generations Combining all the above results, we obtain for the establishment probability of a resistance allele arising in generation in a single copy in our discrete model

## Appendix B: Frequent NHEJ Regime

Our dynamical model for the allele frequencies described by Equations 1–4 assumed that the rates at which resistance alleles are created by mutation and NHEJ are small enough that we can neglect their contribution to changes in allele frequencies between generations. This may no longer be the case if NHEJ becomes more frequent. Extending our dynamical model to such a scenario is straightforward. In this case, the dynamics described by Equations 1–4 become

## Footnotes

*Communicating editor: J. Hermisson*

- Received October 27, 2016.
- Accepted December 1, 2016.

- Copyright © 2017 by the Genetics Society of America