## Abstract

One of the most useful models in population genetics is that of a selective sweep and the consequent hitch-hiking of linked neutral alleles. While variations on this model typically assume constant population size, many instances of strong selection and rapid adaptation in nature may co-occur with complex demography. Here, we extend the hitch-hiking model to evolutionary rescue, where adaptation and demography not only co-occur but are intimately entwined. Our results show how this feedback between demography and evolution determines—and restricts—the genetic signatures of evolutionary rescue, and how these differ from the signatures of sweeps in populations of constant size. In particular, we find rescue to harden sweeps from standing variance or new mutation (but not from migration), reduce genetic diversity both at the selected site and genome-wide, and increase the range of observed Tajima’s *D* values. For a given initial rate of population decline, the feedback between demography and evolution makes all of these differences more dramatic under weaker selection, where bottlenecks are prolonged. Nevertheless, it is likely difficult to infer the co-incident timing of the sweep and bottleneck from these simple signatures, never mind a feedback between them. Temporal samples spanning contemporary rescue events may offer one way forward.

THE simple models used to predict the genetic signatures of selective sweeps have been incredibly helpful in understanding and identifying population genetic signals of adaptation (*e.g.*, Maynard Smith and Haigh 1974; Kaplan *et al.* 1989; reviewed in Stephan 2019). These models are usually based on constant-sized, Wright–Fisher populations. Meanwhile, many instances of adaptation—and thus selective sweeps—in nature will co-occur with complex demography. In fact, many of the most well-known examples of selective sweeps have arisen following a rather extreme and sudden change in the environment (*e.g.*, after the application of insecticides, Sedghifar *et al.* 2016, or antimalarial drugs, Nair *et al.* 2003), which could have simultaneously imposed sharp demographic declines. Attempts to capture such complex demographic scenarios typically impose qualitatively appropriate changes in population size (*e.g.*, Hermisson and Pennings 2005). Indeed, a number of studies have explored the genetic signatures of selective sweeps during demographic bottlenecks (*e.g.*, Innan and Kim 2004; Teshima *et al.* 2006; Wilson *et al.* 2014). However, these demographies are nearly always chosen in the absence of an explicit population model and independently of evolution.

Here, we model selective sweeps in a scenario where demography and adaptive evolution are not independent. In particular, we model an instance of evolutionary rescue (Gomulkiewicz and Holt 1995; reviewed in Bell 2017), where a sudden environmental change causes population decline that is reverted by a selective sweep. Under this framework, rescue is a simultaneous demographic bottleneck and selective sweep, where each affects the other. First, because the mean absolute fitness of the population changes with the beneficial allele’s frequency, the depth and duration of the bottleneck depends on the dynamics of the selective sweep, *i.e.*, evolution affects demography. Second, the probability that the beneficial allele establishes depends on its growth rate and thus the rate of population decline, *i.e.*, demography affects evolution. Together, this feedback between demography and evolution restricts the range of dynamics that are possible, and therefore also restricts the range of genetic signatures we should expect to observe. Our goal here is to describe the range of genetic signatures that (this model of) evolutionary rescue allows, to help elucidate how rescue may obscure inferences of past selection and demography and to identify patterns that could be used to infer rescue in nature.

Most theory on evolutionary rescue to date (reviewed in Alexander *et al.* 2014) has focused on the probability of rescue. Recently, however, some attention has been given to the dynamics of population size (Orr and Unckless 2014), the probability of soft sweeps (Wilson *et al.* 2017), and the genetic basis of adaptation (Osmond *et al.* 2020) given rescue in haploid or asexual populations. Here, we extend this line of thinking to three modes of rescue in diploid, sexual populations, and use coalescent theory and simulations of whole chromosomes to examine the genetic signatures at linked, neutral loci. Our focus is on three common genetic signatures: the number of unique lineages of the beneficial allele that establish (*i.e.*, the softness of the sweep), the pattern of nucleotide diversity around the selected site (*i.e.*, the dip in diversity), and the pattern of Tajima’s *D* around the selected site (*i.e.*, skews in the site-frequency spectrum). We explore three modes of rescue, where the beneficial allele arises from either standing genetic variance, recurrent *de novo* mutation, or migration, and compare to selective sweeps arising in populations of constant size. Qualitatively, we find that for a given selection coefficient rescue causes faster, harder sweeps that produce wider, deeper dips in diversity, and more extreme values of Tajima’s *D*. Due to the feedback between demography and evolution, the effect of rescue on the signatures of selective sweeps becomes more pronounced as the selection coefficient, and, thus, the probability of rescue, gets smaller.

## Materials and Methods

### Population dynamics

#### Deterministic trajectories:

Consider a population of size *N*(*t*) with a beneficial allele, *A*, at frequency *p*(*t*) and an ancestral allele, *a*, at frequency 1−*p*(*t*). Assume nonoverlapping generations, and let *W _{AA}*,

*W*, and

_{Aa}*W*be the absolute fitness (expected number of offspring) of each genotype. We are interested in the scenario where a population composed of primarily

_{aa}*aa*genotypes is declining at some rate

*d*,

*i.e.*,

*W*= 1 −

_{aa}*d*. The beneficial allele,

*A*, is then assumed to act multiplicatively with the fitness of the ancestral background, such that and . Throughout the text, we assume random mating, weak selection, and additivity at the selected locus,

*h*= 1/2, such that the allele frequency [

*cf*. equation 5.3.12 in Crow and Kimura (1970)] and population dynamics [ with the population mean fitness] can be approximated by(1)Equation 1 allows for unbounded population growth; to prevent this we impose a simple carrying capacity at

*N*(0) by setting

*N*(

*t*) to be the minimum of

*N*(0) and the value given in Equation 1.

#### Effective initial allele frequency:

Equation 1 considers only the deterministic dynamics. We are, however, primarily concerned with a stochastic event: the establishment of a (potentially rare) beneficial allele. In rescue, we are particularly interested in only those instances where the beneficial allele establishes—otherwise the population is extinct. Conditioning on establishment creates a bias away from the deterministic trajectory. As shown in Maynard Smith (1971) [see Orr and Unckless (2014) for an application to evolutionary rescue], we can approximate this bias by assuming that a single copy of an allele that establishes with probability *P*_{est} instantaneously increases from an initial frequency of to . In essence, dividing the initial allele frequency by the probability of establishment implies that an allele escaping random loss when rare will quickly increase to a larger frequency than otherwise expected before settling into its deterministic trajectory. As we will see below, we can combine this logic with the waiting time distribution for an establishing mutation to arrive to derive an effective initial allele frequency, *p*_{0}. Using the deterministic prediction (Equation 1) with *p*(0) = *p*_{0} then approximates the forward-time trajectory of an allele that is destined to fixation.

We derive the probability of establishment, *P*_{est}, assuming alleles do not interact (*i.e.*, a branching process). Under this assumption, a single copy of an allele that is expected to leave copies in the next generation with a variance of *v* has an establishment probability of [Allen (2010), p. 172, see also Feller (1951), equation 5.7, for a derivation from a diffusion approximation]:(2)When *d* ≤ 0, we ignore density-dependence, and the growth rate of the establishing heterozygote then depends only on its absolute fitness, The variance, *v*, depends on particulars of the lifecycle (see section *Genetic drift in the simulated lifecycle* in the Supplementary material for details). When *h* = 1 and *v* = 1, we retrieve the probability of establishment of a weakly beneficial mutation in an exponentially growing or declining haploid population with a Poisson offspring distribution, (Otto and Whitlock 1997).

#### Effective final allele frequency:

Above, we have stitched together the initial stochastic phase of the allele frequency and population dynamics with the deterministic phase by using We next incorporate the final phase, fixation, which is also stochastic. Incorporating this final phase is important here (as opposed to studies that predict only the forward-time dynamics; *e.g.*, Orr and Unckless 2014) because in looking at genetic signatures we are concerned primarily with the dynamics backward in time from the point of fixation, and this third phase determines how fast fixation occurs.

As shown in Martin and Lambert (2015), we can treat the backward-in-time dynamics from fixation as we did the forward-in-time dynamics at establishment. In particular, the probability the heterozygote establishes (backward in time) in a population of mutant homozygotes is also given by Equation 2, but with the growth rate of a rare heterozygote in this population, *ϵ*, replaced by (let’s call this probability ). For instance, if the fitness of the heterozygote is and the fitness of the mutant homozygote is and the population is at carrying capacity, then the growth rate of the heterozygote depends only on its relative fitness, which is independent of demography, *d*.

Given the heterozygote establishes (backward-in-time), it will quickly increase to copies, implying that, in a population of size *N*, fixation effectively occurs when the deterministic trajectory reaches a frequency of Below, we will explore scenarios where fixation is expected to occur while the population is at carrying capacity, *N*(0). Thus, in all cases, we have the same effective final allele frequency,(3)where *P*_{est} is given by Equation 2, with and *v* is as given in the section *Genetic drift in the simulated lifecycle* in the Supplemental text.

#### Time to fixation:

We can estimate the time to fixation in the additive model as the time it takes the deterministic approximation (Equation 1) to go from the effective initial frequency, *p*_{0}, to the effective final frequency, *p _{f}*, which is [equation 5.3.13 in Crow and Kimura (1970)](4)

#### Backward-time dynamics:

Setting in Equation 1 gives an approximation of the dynamics of allele frequency and population size backward in time,(5)from the point of fixation, , to the time of establishment, Note that the backward-time allele frequency dynamics do not depend on the initial frequency, *p*_{0}, or demography, *d*, except in determining the maximum value of τ, *t _{f}*. Thus the backward-time allele frequency dynamics are expected to be the same in rescue as in populations of constant size with the same carrying capacity (such that the final frequency,

*p*, is the same). Meanwhile, the backward-time population size dynamics depend heavily on the initial frequency as the time at which the sweep establishes determines the depth and duration of the bottleneck.

_{f}### The structured coalescent

#### Event rates:

To explore the genetic patterns created by evolutionary rescue we next consider a random sample of chromosomes at the time the beneficial allele fixes. Focusing on a neutral locus that is recombination distance *r* from the selected site, we are interested in calculating the rate of coalescence, the rates of recombination and mutation off the selected background, and the rate of migration out of the population. If our sample of alleles has *k* distinct ancestors on the selected background *τ* generations before fixation, these rates are approximately [see Table 1 in Hudson and Kaplan (1988); equation 16 in Pennings and Hermisson (2006a); see section *Deriving the structured coalescent for derivations* in the Supplemental text](6)where is the effective population size *τ* generations before fixation (see section *Genetic drift in the simulated lifecycle* in the Supplemental text for details).

#### Event timing:

We now use the instantaneous event rates (Equation 6) to calculate the probability that the most recent event is *τ* generations before fixation, and is either coalescence, recombination, mutation, or migration. Letting the probability that *i* is the most recent event and occurs *τ* generations before fixation is (*cf*. equation 6 in Pennings and Hermisson (2006b))(7)*i.e.*, the waiting time for an inhomogeneous exponential random variable. The approximation assumes the are small enough such that, at most, one event happens each generation, with small probability, and the changes in the from one generation to the next are small enough that we can approximate a sum across *τ* with an integral. As a technical aside, to speed computation we analytically solve the integrals in Equation 7 under the assumption of unbounded population growth (Equation 1).

### Genetic signatures at linked neutral loci

#### Pairwise diversity:

One classic signature of a selective sweep is a dip in genetic diversity around the selected site (Maynard Smith and Haigh 1974; Kaplan *et al.* 1989). Here, we consider the average number of nucleotide differences between two randomly sampled sequences, *π* (Tajima 1983), focusing on sequences of length 1 (*i.e.*, heterozygosity).

We first consider our expectation for *π* at a site that is far enough away from the selected site to be unaffected by hitchhiking, which will provide us with an expectation for genome-wide average diversity. While not directly impacted by the sweep, these loci are indirectly impacted as the sweep dictates the severity of the population bottleneck. Our expectation for *π* at such a site in a population of constant effective size, *N _{e}*, is simply (Watterson 1975), with

*U*the per base per generation mutation rate at neutral loci. Ignoring neutral mutation input during the sweep, the

*π*at a sufficiently loosely linked site in a population of changing size is this neutral expectation times the probability a sample of size two does not coalesce during the sweep (

*cf*. equation 4 in Slatkin and Hudson (1991) and equation 7 in Griffiths and Tavaré (1994)),(8)

We next consider sites that are close enough to the selected locus that they are directly affected by the selective sweep through hitchhiking. To keep the analysis simple, we assume that, if recombination or mutation moves one of the sampled alleles to the ancestral background before the two samples coalesce, then it is as if both samples were on the ancestral background from the start and therefore coalesce with each other as if they were at an unlinked locus (Equation 8). This assumption ignores the time it takes for the two samples to arrive on the ancestral background, and is therefore expected to underestimate diversity at moderately linked sites. There is also the possibility that no events occur in the history of the sample during the sweep. Then, if the sweep arose from mutation or migration, or from a single copy of the beneficial allele , the sample must coalesce. If we instead start with more than one beneficial copy, , it is possible that the two samples had ancestors that were linked to distinct copies of the beneficial allele within the standing variation. While the coalescent naturally tells us the probability that two lineages do not coalesce during the sweep (since distinct copies of the beneficial allele are exchangeable; *e.g.*, Tavaré 1984), our continuous-time approximation of the coalescent (Equation 7) implicitly assumes a large number of copies of the beneficial allele at any one time. To account for the fact that *κ* is finite, and often small, we assume that, if, under the continuous-time approximation of the coalescent, there remain two lineages on the beneficial background at the beginning of the sweep, then they coalesce with probability [see the “instantaneous” approximation in Anderson and Slatkin (2007) for a similar approach]. Finally, we assume that each copy of the beneficial allele in the standing variance has a recent and unique mutational origin (implying the beneficial allele was sufficiently deleterious before the environmental change; *cf*. Prezeworski *et al.* 2005; Berg and Coop 2015), so that two distinct ancestral lineages are independent draws from a neutral population. Ignoring neutral mutations during the sweep, a simple approximation for diversity at any location in the genome is then(9)where is the probability of recombination or mutation before coalescence during the sweep and is the probability that no events have occurred in the history of the sample during the sweep . Equation 9 is a function of recombination distance, *r*, and approaches as *r* gets large.

#### Tajima’s D:

As a second genetic signature at linked neutral sites we consider Tajima’s *D* statistic (Tajima 1989), which measures the excess (positive *D*) or deficiency (negative *D*) of intermediate frequency polymorphisms relative to the standard neutral model. Quantitative predictions of Tajima’s *D* require one to consider samples of size greater than two, which quickly becomes complicated with selection and nonequilibrial demography. Instead, here we discuss the expected qualitative patterns based on intuition from the analysis presented above.

First, hard selective sweeps tend to produce star-like gene genealogies, with most samples coalescing near the beginning of the sweep, and recombination allowing a few samples to coalesce much further back in time (Kaplan *et al.* 1989). Hard sweeps therefore produce an excess of low and high frequency polymorphisms (Wakeley 2009, p. 120), leading to negative *D* (Braverman *et al.* 1995). Larger selection coefficients create more star-like genealogies (as there is then less time for recombination off the sweep), and, thus, more negative *D* when conditioned on a hard sweep. However, with sufficient standing genetic variation (SGV) or rates of recurrent mutation or migration, larger selection coefficients will tend to cause softer selective sweeps as they increase the probability any one copy establishes (Equation 2). Soft selective sweeps (by definition) allow some samples to coalesce further back in time, before the start of the sweep, even at the selected site. Soft sweeps therefore tend to have less effect on neutral genealogies, and, hence, on *D*, although sufficiently soft sweeps can actually cause positive *D*, by allowing intermediate-sized groups of samples to descend from different ancestors containing the beneficial allele (Pennings and Hermisson 2006b).

As linkage to the selected site decreases, so too does this skew in genealogies. In the case of a constant population size, *D* should asymptote to the neutral expectation of zero. In the case of rescue, however, the bottleneck will cause an excess of intermediate frequency polymorphisms (Wakeley 2009, p. 120), and, therefore, *D* should asymptote at some positive value (more positive with more severe bottlenecks).

### Simulations

The simulation details are described in full in the section S*imulation details* in the Supplemental text. Briefly, the lifecycle described in *Simulated lifecycle* was simulated in SLiM 3 (Haller and Messer 2019) with tree-sequence recording (Haller *et al.* 2019). We simulated a 20 Mb segment of a chromosome, with all but one of the center loci neutral, with a per base recombination rate of and per base mutation rate at neutral loci of *U* = 6 × 10^{−9} (both inspired by *Drosophila* estimates; Mackay *et al.* 2012; Haag-Liautard *et al.* 2007). A simulation was considered successful and ended when the beneficial mutation was fixed and the population size had recovered to its initial size, *N*(0). Successful simulations were recapitated with Hudson’s coalescent algorithm (Hudson 1983, 2002), implemented in msprime (Kelleher *et al.* 2016), using (see *Genetic drift in the simulated lifecycle*). Pairwise diversity, Tajima’s *D*, site frequency spectra, and linkage disequilibrium were then calculated directly from the tree sequences using tskit (Kelleher *et al.* 2018), for a random sample of 100 contemporary chromosomes. We chose to simulate a population with initial census size *N*(0) = 10^{4} declining at rate *d* = 0.05, where the beneficial allele had selection coefficient *s* = 0.13 or *s* = 0.20. This describes a relatively small population that is expected to go extinct in ≈200 generations in the absence of a sweep at a locus under strong selection (as might be expected following a severe environmental change).

### Data availability statement

The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article and Supplementary Material. Code used to derive and plot all results (Python scripts and Mathematica notebook) is available at https://github.com/mmosmond/rescueCoalescent.git and on figshare. File S1 refers to the Mathematica notebook, which is also provided in freely accessible forms (*e.g.*, PDF). Supplemental material available at figshare: https://doi.org/10.25386/genetics.12284324.

## Results

### Rescue from standing genetic variation

We first consider rescue arising from genetic variation that is present at the time of the environmental change and ignore further mutations and migration. To avoid complicating the presentation, we assume the initial number of beneficial alleles is given; treating the initial number as a random variable requires conditioning the distribution on a successful sweep (Hermisson and Pennings 2017).

#### The probability of rescue and soft selective sweeps:

Given there are initially copies of the beneficial allele, the number that establish is roughly binomially distributed with *κ* trials and success probability *P*_{est}. The probability of rescue is the probability that at least one establishes (*cf*. equation 2 in Orr and Unckless 2014),(10)Conditioning on rescue, the expected number of establishing copies is(11)Note that when the unconditioned expected number of establishing copies, , is small, the probability of rescue is roughly . Equation 11 then implies that we expect only one of the initial copies to establish, *i.e.*, we expect a hard sweep. For larger values of rescue will often occur by a soft selective sweep (Hermisson and Pennings 2005), where multiple initial copies establish. The probability that multiple copies establish is the probability of rescue minus the probability of a hard sweep, Given rescue occurs, the probability it is due to a soft sweep can therefore be written as(12)As expected, the probability of a soft sweep given rescue is 0 when we start with a single copy, and approaches 1 as the number of copies becomes large. Between these two extremes, we find that Equation 12 provides reasonable estimates for small *d* or large *s* but underestimates the probability of a soft sweep otherwise (Figure 1A), when beneficial alleles can persist at low numbers long enough to establish with some non-negligible probability of experiencing some selection as homozygotes (thus increasing their probability of establishment). Using the probability of establishment observed in the simulations corrects this error (see File S1). Because the expected number of copies that establishes given rescue (Equation 11) is nearly independent of *P*_{est} (as long as it is small), Equation 11 more closely matches simulations (Figure 1B).

#### Effective initial allele frequency and the backward-time dynamics:

As each of the expected establishing copies given rescue is expected to rapidly reach 1/*P*_{est} copies (see section *Effective initial allele frequency*), it is as if the sweep began deterministically (Equation 1) with initial frequency (*cf*. equation S1.4 in Orr and Unckless 2014)(13)This same conditioning applies in a population of constant size (*i.e.*, *d* = 0). As the decline rate, *d*, increases, the probability a copy establishes, *P*_{est}, and, hence, the probability of rescue, , declines, making the conditioning stronger (*i.e.*, increasing the difference between the effective and true initial allele frequencies). This causes selective sweeps to get started faster as *d* increases, implying that, all else equal, rescue sweeps are expected to be shorter than those in populations of constant size.

Using as *p*_{0} in Equations 4 and 5 gives our semideterministic prediction of the backward-time dynamics. As shown in Figure 2, this does a reasonably good job of describing the simulation results. Looking closer, we find that our predictions do an excellent job of approximating the mean allele frequency as it departs from 1, but typically begin to underestimate the mean allele frequency as frequencies drop lower (see File S1 for more detail). Consequently, we tend to underestimate the mean time to fixation (compare arrows and stars in Figure 2) and overestimate mean population size. This error arises even under a true branching process (*e.g.*, under clonal reproduction) in a population of constant size, and becomes worse as selection becomes weaker (see File S1). The reason for this error is that one minus the effective final allele frequency, 1 −*p _{f}* =

*q*is exponentially distributed with expectation (Martin and Lambert 2015), where is the backward-time establishment probability of the heterozygote in a population of mutant homozygotes and we have assumed fixation occurs when the population size is

_{f}*N*(0) (which is expected to be true in all of our numerical examples). The variance in the effective final allele frequency therefore increases like as

*s*declines, meaning that, under weaker selection, there is much more variance in the time spent in the stochastic phase near fixation. Combining this distribution of waiting times to leave the stochastic phase with the nonlinearities of the subsequent deterministic phase causes our semideterministic approximation to generally underestimate allele frequency. Despite this error, we forge ahead and use our simple semideterministic approximations in the structured coalescent (Equations 6 and 7).

Before moving on to the coalescent, however, a number of biological insights can be gleaned from Figure 2 that will help explain downstream results: (1) the backward-time allele frequency dynamics are the same in rescue as they are in populations of constant size, since they depend only on the effective final allele frequency, *p _{f}*, which does not depend on

*d*(Equation 3), (2) fixation times are shorter in rescue as, forward-in-time, those sweeps tend to get started faster (

*i.e.*, the effective initial frequency increases with

*d*; Equation 13), (3) weaker selection causes larger population bottlenecks as the deterministic portion of the sweep is then slower and a given frequency change has less impact on population mean fitness (Equation 1), and (4) both the allele frequency and population size dynamics depend only weakly on the initial number of mutants, despite large differences in the probability of rescue, because sweeps that are less likely to occur get started faster (Equation 13).

#### The structured coalescent:

For a linked neutral allele in a sample of size 2, Figure 3 compares the predicted probability of recombination off the selective sweep (opaque dashed curves) and coalescence on the beneficial background (opaque solid curves) with the mean probabilities given the allele frequency and population size dynamics observed in simulations (transparent curves). We find that our approximations do relatively well near fixation, where our predictions of allele frequency are better (Figure 2), but that our underestimates of allele frequency at later times cause us to generally overestimate rates of recombination and coalescence on the beneficial background near the beginning of the sweep. Despite this error, we capture the qualitative dynamics and can draw out a number of interesting biological consequences: (1) the timing of the bottleneck during rescue pushes coalescent times toward the present, so that the distributions of coalescence and recombination times overlap more than in populations of constant size, (2) the bottleneck also increases the overall probability of coalescence in rescue, which reduces the probability of recombination off the sweep (compare areas under dashed curves), and (3) the difference in the structured coalescent between rescue and a population of constant size is larger under weaker selection. This latter point nicely illustrates the coupling between demography and evolution in rescue; while weaker selection creates a slower sweep, and, hence, more time for recombination off the beneficial background, it also slows population recovery, leading to longer and deeper bottlenecks that increase coalescence enough to counteract the additional time provided for recombination.

#### Genetic diversity:

Figure 4 compares our predictions of relative pairwise diversity, (Equations 8 and 9), against simulations, . Despite the errors in the predictions of the dynamics above (Figure 2 and Figure 3), our approximations capture the relative rates of recombination and coalescence. Two main conclusions emerge: (1) rescue tends to deepen dips in diversity when soft sweeps are possible, *i.e.*, it hardens soft sweeps because the bottleneck increases the probability of coalescence [decreasing at the selected site, *r* = 0; Equation 9], and (2) rescue generally produces wider dips in diversity due to excess, and earlier, coalescence as well shorter sweeps (the magnitude and timing of the bottleneck, as well as lower establishment probabilities, decreases at a given *r* > 0; Equation 9). Note that when the probability of rescue is very small (Figure 4D), we predict the diversity at the selected site to actually be higher under rescue than in a population of constant size. This is driven by our prediction of fixation time; low probabilities of rescue imply high effective initial allele frequencies, leaving less time for coalescence to occur (compare solid curves in Figure 3D). This is prediction is not, however, borne out in the observed simulations, as we tend to overestimate coalescence in populations of constant size and underestimate it in rescue (Figure 3D).

If the population was sampled both before and after the selective sweep (or we have very good estimates of its mutation rate and long-term effective population size), the absolute amount of pairwise diversity contains additional information. In the Supplemental Material, Figure S1 compares our predictions of (Equation 9) against simulations, showing that the bottleneck decreases background levels of diversity as well. This decrease is more evident under weaker selection, where bottlenecks are more pronounced. While our predictions qualitatively match the mean simulation results, our tendency to underestimate allele frequencies, and, thus, overestimate harmonic mean population sizes during rescue (Figure 2) causes us to generally overestimate background diversity in these cases. To correct for this, we can replace our prediction for background diversity, (Equation 8), in the rescue scenario with the observed genome-wide average diversity (dashed curves in Figure S1). A very similar result is achieved by using effective population sizes observed in simulations in Equation 8 (results not shown). Using the observed mean diversity level may be justified by the fact that genome-wide diversity can be measured directly from data and is highly variable across populations (Tajima 1983).

#### Tajima’s *D*:

Figure 5 shows the Tajima’s *D* values observed in simulations. There are two main take-aways: (1) the bottleneck during rescue causes positive background *D* and (2) rescue sweeps tend to be harder and thus cause a greater decrease (or smaller increase) of *D* at the selected site. Together, these patterns cause rescue to “stretch out” (*e.g.*, panel D) or even invert (*e.g.*, panel C) the pattern of *D* observed in populations of constant size.

#### A simultaneous, but independent, bottleneck and sweep:

It will of course be much harder to differentiate rescue from a simultaneous, but independent, bottleneck and sweep. In *A simultaneous, but independent, bottleneck and sweep from SGV* (Supplemental Material) we compare rescue, with expected effective population size *N _{r}* (from Equation 1), to sweeps that occur in populations suddenly bottlenecked to size . Because rescue has higher population sizes at the beginning and end of the sweep the sweeps take longer, leaving more time for coalescence, which leads to harder sweeps and lower genome-wide diversity. To see if the slightly different timings of recombination relative to coalescence are detectable, we examine the full site frequency spectrum as well as linkage disequilibrium. There is perhaps a slightly more uniform spectrum across intermediate frequency mutations at tightly linked sites under rescue, as expected given coalescence then overlaps more with recombination (similar to the effect of recurrent mutation, Pennings and Hermisson 2006b). Linkage disequilibrium is elevated under rescue, largely mirroring the patterns we see in absolute diversity.

### Rescue by *de novo* mutation

#### The probability of rescue and soft selective sweeps:

When there are few copies of the beneficial allele at the time of environmental change, rescue may depend on mutations arising *de novo* at the selected site during population decline. To predict the allele frequency and population size dynamics in this scenario, we then need to derive the waiting time until a rescue mutation successfully establishes. Ignoring unsuccessful mutations, the first successful rescue mutation arrives according to a time-inhomogeneous Poisson process with rate, where describes the decline in the number of copies of the ancestral allele and *u* is rate at which ancestral alleles mutate to rescue alleles. Thus, the probability that a rescue mutation has established by time *T* is Taking the limit as time goes to infinity then gives the probability of rescue [*cf*. equation 10 in Orr and Unckless (2008)](14)

We can also calculate the probability of a soft sweep during rescue from recurrent mutation. Taking into account the beneficial alleles present at time *t*, the rate at which additional copies arise and establish is where is the number of ancestral alleles at time *t*, and *P*_{est} (*t*) is the probability of establishment at time *t*, which changes with allele frequency, and thus time, because allele frequency influences the genotypes the rescue allele experiences (*i.e.*, its marginal fitness). Thus the number of mutations that arise and fix is Poisson with rate To gain intuition, we make a very rough approximation, assuming while mutations are arriving [*i.e.*, while *N*(*t*) is still large], so that and we get the same Poisson rate we derived above for the first successful mutation, The resulting probability distribution for the number of mutations that establish is analogous to the result in a model with haploid selection [*cf*. equation 7 in Wilson *et al.* (2017)]. Dividing the expected number of establishing mutations by the probability of rescue, the expected number that establish given rescue is(15)With these same approximations, the probability of a soft selective sweep given rescue (*i.e.*, the probability that more than one copy establishes) is(16)as in a haploid population [equation 8 in Wilson *et al.* (2017)].

Both of these approximations tends to be underestimates (Figure 6), especially when selection is weak. This is partly because we also tend to underestimate the probability of rescue, which is elevated by increases in the frequency of the beneficial allele, which increases the marginal fitness of the ancestral allele whenever *h* > 0 (prolonging its persistence and, therefore, creating more opportunity for mutation) and increases the marginal fitness of the mutant allele (increasing the establishment probability). However, using the observed probability of rescue in Equations 15 and 16 still results in underestimates (see File S1) because ignoring increases in the beneficial allele frequency becomes less reasonable as more copies are established.

#### Effective initial allele frequency and the backward-time dynamics:

Following Orr and Unckless (2014) (see their Text S2 and our File S1 for more detail), taking the derivative of the cumulative distribution of waiting times for the first successful mutation, *F*(*T*), and dividing by the probability of rescue gives the probability distribution function for the arrival time of the first establishing rescue mutation given rescue, *f*(*t*). As discussed above, while the first establishing mutation is still rare it will grow exponentially at rate and conditioned on its establishment, will very quickly reach copies. Integrating over the distribution of arrival times then gives the expected number of copies of this successful mutation at time *t* since the environmental change given rescue, Solving this integral, evaluating at *t* = 0, and dividing by the total number of alleles at the time of environmental change, 2*N*(0), it is therefore as if the successful mutation was present at the time of environmental change, with frequency(17)where is the gamma function [Equation 6.1.1 in Abramowitz and Stegun (1972)] and is the incomplete gamma function [equation 6.5.3 in Abramowitz and Stegun (1972)]. The factor increases the effective initial frequency, because we have conditioned on establishment, while the last factor decreases the effective initial frequency, because we must wait for the mutation to arise. When the unconditioned expected number of rescue mutations, is small, this cancels out and the last factor in Equation 17 becomes approximately , which is independent of the mutation rate (as in the haploid case; Orr and Unckless 2014). That is, conditioning on unlikely rescue, rescue mutations arise earlier in populations that decline faster.

With constant population size, the time of establishment of the first successful mutation is irrelevant for the backward-in-time dynamics, and, therefore, for the resulting genetic signatures (in contrast, when *d* > 0 we need to predict the time of establishment to predict the population size dynamics; Equation 5). For populations of constant size, we need only determine the effective allele frequency at the time the sweep begins, which is simply , independent of the mutation rate.

Figure S2 compares our analytical approximations of the dynamics (using Equations 5 and 17) against individual-based simulations. As with rescue from standing genetic variance (Figure 2), the nonlinearities of the deterministic phase cause our semideterministic approximation to underestimate mean allele frequencies and overestimate mean population sizes as the allele frequency declines from fixation. We again see that rescue sweeps tend to be shorter than those in populations of constant size (due to lower establishment probabilities) and that the bottleneck sizes and sweep times given rescue depend little on the probability of rescue.

#### The structured coalescent:

Figure S3 shows the timing of coalescence, recombination, and mutation for a sample of size two at a linked neutral locus. The patterns of coalescence and recombination are essentially the same as observed during sweeps from standing genetic variation (Figure 3). The timing of mutation is qualitatively similar to that of coalescence, peaking near establishment (since both depend on the inverse of allele frequency). The main effect of the excess coalescence during rescue is a reduced probability of mutating off the sweep (compare areas under dotted curves), which is exacerbated by rescue sweeps being shorter.

#### Genetic signatures at linked neutral loci:

Figure S4 shows relative pairwise diversity around the selected site. The patterns here are nearly identical to those found after sweeps from standing genetic variation (Figure 4), although the difference between rescue and the constant population size case are larger here, perhaps due partly to the reduction in rates of mutation off the selected sweep (and also because our parameter choice causes the bottlenecks to be deeper when rescue occurs by mutation). Figure S5 shows Tajima’s *D* values around the selected site, which are nearly identical to those observed after sweeps from standing genetic variation (Figure 5).

On a related note, Wilson *et al.* (2017) reasoned that, because soft sweeps from recurrent mutation are expected when rescue is likely while hard sweeps are expected when rescue is rare (Equation 16), population bottlenecks will tend to be more extreme when rescue occurs by a hard selective sweep. From this they argued it might actually be easier to detect soft sweeps from patterns at linked neutral loci, as bottlenecks are expected to obscure the signal. Here, we show the importance of conditioning on rescue, which roughly equalizes bottleneck sizes across scenarios with very different probabilities of rescue (*e.g.*, due to differences in mutation rate; Figure S2), potentially making harder sweeps easier to detect due to their greater effect on local gene genealogies (Figures S3–S5).

### Rescue by migrant alleles

#### The probability of rescue and soft selective sweeps:

Rescue can also arise from beneficial alleles that arrive via migration. Assuming that the number of migrant alleles that replace a resident allele each generation is Poisson with mean *m*, the waiting time until the first successful migrant is exponential with rate *λ*= m*P*_{est}. Given the population is expected to persist in the absence of beneficial alleles for generations, the probability of rescue is therefore roughly the probability the first successful migrant arrives by then,(18)

Under rescue from standing variation or new mutation we derived the probability and extent of soft selective sweeps from the forward-time process. In contrast, under rescue from migration we use the coalescent. In particular, Equation 6 shows that the per generation probability of migration and coalescence depend on population size and beneficial allele frequency in the same way. This similar form implies that the relative rates at which lineages coalesce and migrate at the selected site does not depend on the population size and allele frequency. Pennings and Hermisson (2006a) used this fact to show that, in an ideal population of constant size, the number of unique migrant haplotypes contributing to a present-day sample, as well as their proportions, is described by Ewens’ sampling formula (pp. 334*ff* in Ewens 2004) when we replace *θ* with . Powerfully, this results holds even in nonideal populations of changing size (as briefly noted by Pennings and Hermisson 2006a, pp. 1081–1082)—including during evolutionary rescue—as long as the relationship between the effective and census population sizes remains the same [*i.e.*, if is a constant; in which case we now replace *θ* with in Ewens’ sampling formula]. Thus the softness of a sweep from migration depends only on the migration rate and variance in gamete numbers (σ^{2}, which determines ; see section *Genetic drift in the simulated lifecycle* in the Supplemental text), and is the same during rescue as it is in a population of constant size (*i.e.*, it is independent of *d*). The analogous result for rescue by *de novo* mutation does not hold (as it does for a population of constant size, Pennings and Hermisson 2006a), since the rate of mutation is not inversely proportional to population size (Equation 6).

Here, we use just two properties of Ewens’ sampling formula, the expected number of unique migrants among a sample of size *n* (p. 336 in Ewens 2004),(19)and the probability that this is >2 (equation 10.9 in Ewens 2004),(20)Figure 7 shows that these formulas perform very well, even when we sample the entire population [*n* = 2*N*(0)].

#### Population dynamics and the coalescent:

In the section *Population dynamics and the coalescent under rescue by migration* in the Supplemental text, we derive the expected backward-time dynamics and structured coalescent under rescue by migration. The results are closely analogous to rescue from mutation. We do not explore genetic signatures at linked neutral sites in this case as these depend on the demographic history of the metapopulation. Previous work has explored some potential signatures of migrant sweeps in populations of constant size (*e.g.*, Setter *et al.* 2019).

## Discussion

Here, we have explored genetic signatures of evolutionary rescue by a selective sweep. By allowing demography to depend on the absolute fitness of the genotypes that comprise the population, we explicitly invoke a feedback between demography and evolution. This feedback restricts the range of dynamics, and, thus, the signatures, that one should expect to observe. We find that, because the probability an allele with a given selective advantage establishes is reduced in declining populations (Equation 2; see also Otto and Whitlock 1997), selective sweeps causing rescue are expected to be harder than those in populations of constant size when sweeps arise from standing genetic variance or recurrent mutation (Figures 1 and 6; consistent with Wilson *et al.* 2014, 2017). Further from the selected locus, the demographic bottleneck experienced during rescue increases the rate of coalescence relative to mutation and recombination (Figure 3 and Figure S3), creating wider dips in relative diversity (Figure 4 and Figure S4) and lower absolute diversity genome-wide (Figure S1; consistent with Innan and Kim 2004). Like absolute diversity, Tajima’s *D* captures both the hardening of the sweep and the demographic bottleneck, causing *D* to often reach both higher and lower values under rescue (Figure 5 and Figure S5). These differences between evolutionary rescue and standard sweeps all become larger under weaker selection (where the heterozygote has a smaller growth rate, ) as the slower sweeps that result imply deeper, longer bottlenecks during rescue. If, instead, we compare rescue to populations bottlenecked to the same effective population size during the sweep, we find that the remaining signatures of rescue are harder sweeps that take longer to complete, allowing more time for coalescence, and, hence, lower genome-wide diversity and elevated linkage disequilibrium (Figure A3). The subtle difference in the timing of recombination between the two scenarios, with recombination overlapping more with coalescence during rescue, may also increase variation in the number of derived mutations at a site (as expected following soft sweeps from recurrent mutation in populations of constant size; Pennings and Hermisson 2006b), giving rise to site frequency spectra with perhaps slightly “flatter bottoms” (Figure A2).

In contrast to standing variance or mutation, when sweeps arise from a constant rate of migration demography has no affect on the number of beneficial alleles that establish (as briefly noted by Pennings and Hermisson 2006a), and, thus, rescue has no affect on the hardness of the sweep (Figure 7). Further, because the rates of coalescence and migration are both inversely proportional to the number of beneficial alleles [, Equation 6, Figure A5], the distribution of the number and frequency of migrant haplotypes spanning the selected site is given by Ewens’ sampling formula (Ewens 1972), with mutation replaced by migration (Pennings and Hermisson 2006a). The signatures of rescue by migration at linked neutral loci depend on the demographic and selective history of the metapopulation, which we do not explore here.

Evolutionary rescue has been explored theoretically (*e.g.*, Gomulkiewicz and Holt 1995; Uecker and Hermisson 2016; Anciaux *et al.* 2018) and observed repeatedly in both experiments (*e.g.*, Bell and Gonzalez 2009; Lindsey *et al.* 2013; Ramsayer *et al.* 2013) and in host–pathogen systems in nature (*e.g.*, Wei *et al.* 1995; Feder *et al.* 2016). More recently, a number of studies have examined genetic data following purported evolutionary rescue in the wild, including bats altering hibernation to survive white-nose syndrome (Gignoux-Wolfsohn *et al.* 2018), killifish deleting receptors to tolerate pollution (Oziolor *et al.* 2019), and tall waterhemp evolving herbicide resistance (Kreiner *et al.* 2019). In the latter two cases, there is strong evidence of a recent selective sweep by a very beneficial allele (in one of these cases the evidence includes reduced nucleotide diversity at the selected site; Kreiner *et al.* 2019). Genetic evidence for a putatively recent demographic bottleneck was presented in only one of these studies (Oziolor *et al.* 2019), being detected from genome-wide reductions in nucleotide diversity and increases in Tajima’s *D* relative to populations that did not experience the same environmental change. Developing a statistical model to fit our theory to such data in a more quantitative way could lead to parameter estimates, which could then inform how likely rescue may have been (or if persistence was assured regardless, *d* ≈ 0). To do so one would also have to incorporate the time since fixation, as the site frequency spectra and linkage disequilibrium will begin a return to their neutral expectations following the sweep and bottleneck (*e.g.*, the sweep signatures in Tajima’s *D* and linkage disequilibrium decay within generations; Przeworski 2002). Ignoring the time since fixation would therefore cause underestimates of *s* and *d*, making rescue look more like a weaker sweep in a constant population. In contrast, the deeper and wider dips in genetic diversity observed under rescue would likely inflate estimates of selection if demography was ignored.

A lingering question that helped motivate this study is whether one could ever infer evolutionary rescue from genetic data alone. Such a possibility would greatly help assess the relevance of rescue in nature. In the empirical examples just discussed, we implied that evidence for a sweep and a bottleneck is consistent with rescue. Yet stronger support for rescue would come from inferring the coincident timing of the sweep and bottleneck. Our analysis suggests that detecting coincident sweeps and bottlenecks from site frequency spectra and linkage disequilibrium will be difficult as the relative timing of coalescence and recombination leaves only subtle signatures. Consistent with this, explicit estimates of the timing of a sweep (*e.g.*, Ormond *et al.* 2016) or a bottleneck (reviewed in Beichman *et al.* 2018) given a genetic sample collected from a single time point come with relatively large amounts of uncertainty. Sampling before and after the potential rescue event may provide one way forward, as it would help tighten the bounds on the timing of the bottleneck and sweep. However, it should be noted that, even if the sweep and bottleneck appear to have co-occurred, this correlation in timing does not imply it was caused by a feedback between demography and evolution. It is, of course, difficult to say in any case—without observing replicate populations go extinct or performing experiments—whether extinction would have occurred (or will occur) without adaptive evolution, as required by the strict definition of evolutionary rescue. More experiments (such as in Rêgo *et al.* 2019) that explore the genetic consequences of verified rescue may help develop a robust genetic signal of the feedback between demography and evolution during rescue, or at least determine if such a feedback can be detected in real genomes.

A strength of the above analysis is that we have explicitly modeled a feedback between demography and evolution, restricting the range of genetic signatures we consequently expect to observe. To take a recent example, Harris *et al.* (2018) have claimed that the lower reductions in genetic diversity within HIV populations adapting to less efficient drugs (as observed by Feder *et al.* 2016) could be due to weaker bottlenecks or slower sweeps rather than sweeps being softer, *i.e.*, arising from multiple mutations. Fortunately, in this case genetic time-series data were available to show that the ability of HIV to reliably adapt on a short time-scale necessitates mutation rates and selection coefficients that imply adaptation by soft sweeps is likely (Feder *et al.* 2018). Formally modeling a feedback between demography and evolution also helps narrow the relevant parameter range. For example, under a haploid version of the model explored here (as is applicable to HIV) the minimum population size during rescue by new mutations is [equation 22 in Orr and Unckless (2014)]. Thus, for a given initial population size, *N*(0), and selection coefficient, *s*, the smallest minimum population size is where implying that the smallest minimum population size consistent with the model is roughly proportional to 1/*s*. The imposed feedback between demography and evolution therefore precludes simultaneously slow sweeps and large bottlenecks [for example, negating the two smallest bottleneck sizes in Figure 3A of Harris *et al.* (2018) and constraining one to the upper right portion of Figure 3A,B in Feder *et al.* (2018)]. While it is very likely in this case that soft sweeps are indeed the cause of the pattern (Feder *et al.* 2018), incorporating an explicit model of how demography and evolution interact could help focus future debates.

The model presented here is but one model of evolutionary rescue, which involved a number of assumptions. One of these is that the beneficial allele acts multiplicatively with the ancestral background, so that the absolute fitness its carriers, and —and thus the probability it establishes—are affected by the decline rate of the ancestral genotype, *d*. Meanwhile, this same assumption caused the relative fitness of the mutants, 1 + *sh* and 1 + *s*—and thus the allele frequency dynamics during the sweep, once started—to be independent of the initial decline rate (Equation 1). If, instead, we made the absolute fitness of the heterozygote and mutant homozygote independent of the initial decline rate, say 1 + *sh* and 1 + *s*, then the reverse would be true; the relative fitness of the mutant would depend on the initial rate of population decline, *d*, while its absolute fitness while establishing would not. We expect these effects would, however, largely cancel out (higher establishment probabilities will cause longer sweeps and more coalescence, but slower changes in allele frequency will allow more recombination off the sweep). In any case, because our results depend primarily on the absolute and relative fitness of the heterozygote, the alternative model just described may closely match the model analyzed in detail here when *sh* is replaced by

We have also assumed that the beneficial allele acts additively with the ancestral allele at that locus This assumption has been made for convenience; it yields a simple explicit approximation of the allele frequency and population dynamics (Equation 5) that allows closed-form solutions of the structured coalescent (Equation 7), greatly accelerating our computation of the coalescent and pairwise diversity (Equation 9). Alternative forms of dominance are, however, likely and will impact our results. At one extreme, a completely recessive beneficial allele (*h* = 0) is much less likely to establish when compared to additivity, even in a constant population [compare Equation 2 with *v* = 1 and to equation 15 in Kimura (1962)], making rescue nearly impossible in outcrossing populations (Uecker 2017). In fact, any will cause the heterozygote to have a negative growth rate (*i.e.*, be subcritical), meaning that establishment will rely on the stochastic persistence of subcritical lineages until they create a supercritical mutant homozygote, analogous to the fixation of underdominant alleles or chromosomal rearrangements. As it is very unlikely that such alleles will establish in time to rescue the population (compared to supercritical heterozygotes), we have neglected this parameter range in order to focus on parameter values that are more likely to be reflected in empirical systems where rescue has occurred. It may, however, be possible to combine our approach with that taken for sweeps with arbitrary dominance in populations of constant size (Ewing *et al.* 2011) and the probability of establishment of underdominant alleles [*e.g.*, equation 3 in Lande (1979)]. At the other extreme, complete dominance (*h* = 1) will greatly increase the probability of establishment and rescue (Uecker 2017), as well as population mean fitness and, thus, population size. All else equal, we therefore expect rescue to have less effect on the signatures of selective sweeps relative to those in populations of constant size when the rescuing allele is more dominant. Given that the marginal fitness of the beneficial allele will not depend on allele frequency under complete dominance, we expect dynamics much like the haploid model (Orr and Unckless 2014), where simple predictions of allele frequency and population size are more accurate.

Finally, it is of course possible to model rescue under much more complex lifecycles and population structure (*e.g.*, as expected for the evolution of malarial drug resistance; Kim *et al.* 2014), at least using simulations. More complex lifecycles, such as those of parasites like *Plasmodium* and HIV, could cause the bottleneck to have additional impacts on the resulting genetic signature. For example, our populations are obligate sexual outcrossers, meaning that the probability of recombination does not depend on the population size (*cf*. Equation 6) as everyone must mate with someone. However, with selfing and/or facultative sex (genetic exchange), rates of recombination could be lower at lower population densities (as expected for HIV), which would increase the impact of bottlenecks on resulting genetic signatures.

Evolutionary rescue is only one example of myriad processes where demography and evolution feedback on one another. Our approach—combining forward-time eco-evolutionary models with coalescent theory to predict genetic signatures—could be used in many other scenarios. For instance, adaptive colonization of new habitat (a.k.a., adaptive niche expansion) is a closely related process for which a similar approach has already been taken (Kim and Gulisija 2010). As in the case of rescue, explicitly modeling the feedback between demography and evolution in adaptive niche expansion changes the expected signatures left behind by selective sweeps as compared to (bottlenecked) Wright-Fisher populations. Such an approach is interesting from a conceptual point of view, improving our understanding of how eco-evolutionary dynamics affect genetic signatures. But further, given the computational power and simulation platforms available today, it is no longer necessary to restrict oneself to Wright–Fisher populations—researchers may now simulate under much more ecologically realistic models. As our models become more complex, developing simple approximations will become increasingly important for intuition.

## Acknowledgments

We thank Pleuni Pennings and Alison Feder for helpful discussions, Ben Haller for comments on our SLiM code, and Joachim Hermisson and the anonymous reviewers for insightful comments. Financial support was provided by the Center for Population Biology at the University of California - Davis (fellowship to M.M.O.), Banting (Canada; fellowship to M.M.O.), and the National Institute of General Medical Sciences of the National Institutes of Health (NIH R01 GM108779, awarded to G.C.).

## Footnotes

Supplemental material available at figshare: https://doi.org/10.25386/genetics.12284324.

*Communicating editor: J. Hermisson*

- Received March 8, 2020.
- Accepted May 6, 2020.

- Copyright © 2020 by the Genetics Society of America