## Abstract

Purifying selection reduces genetic diversity, both at sites under direct selection and at linked neutral sites. This process, known as background selection, is thought to play an important role in shaping genomic diversity in natural populations. Yet despite its importance, the effects of background selection are not fully understood. Previous theoretical analyses of this process have taken a backward-time approach based on the structured coalescent. While they provide some insight, these methods are either limited to very small samples or are computationally prohibitive. Here, we present a new forward-time analysis of the trajectories of both neutral and deleterious mutations at a nonrecombining locus. We find that strong purifying selection leads to remarkably rich dynamics: neutral mutations can exhibit sweep-like behavior, and deleterious mutations can reach substantial frequencies even when they are guaranteed to eventually go extinct. Our analysis of these dynamics allows us to calculate analytical expressions for the full site frequency spectrum. We find that whenever background selection is strong enough to lead to a reduction in genetic diversity, it also results in substantial distortions to the site frequency spectrum, which can mimic the effects of population expansions or positive selection. Because these distortions are most pronounced in the low and high frequency ends of the spectrum, they become particularly important in larger samples, but may have small effects in smaller samples. We also apply our forward-time framework to calculate other quantities, such as the ultimate fates of polymorphisms or the fitnesses of their ancestral backgrounds.

PURIFYING selection against newly arising deleterious mutations is essential to preserving biological function. It is ubiquitous across all natural populations and is responsible for genomic sequence conservation across long evolutionary timescales. In addition to preserving function at directly selected sites, negative selection also leaves signatures in patterns of diversity at linked neutral sites, which have been observed in a wide range of organisms (Begun and Aquadro 1992; Charlesworth 1996; Cutter and Payseur 2003; McVicker *et al.* 2009; Flowers *et al.* 2012; Comeron 2014; Elyashiv *et al.* 2016). This process is known as background selection and understanding its effects is essential for characterizing the evolutionary pressures that have shaped a population, as well as for distinguishing its effects from less ubiquitous events such as population expansions or the positive selection of new adaptive traits.

At a qualitative level, the effects of background selection are well known: it reduces linked neutral diversity by reducing the number of individuals that are able to contribute descendants in the long run. Since individuals that carry strongly deleterious mutations cannot leave descendants on long timescales, all diversity that persists in the population must have arisen in individuals that were free of deleterious mutations. Since all of these individuals are equivalent in fitness, this suggests that diversity should resemble that expected in a neutral population of a smaller size—specifically, with a size equal to the number of mutation-free individuals (Charlesworth *et al.* 1993).

However, an extensive body of work has shown that this intuition is not correct and that background selection against strongly deleterious mutations can lead to nonneutral distortions in diversity statistics (Charlesworth *et al.* 1993, 1995; Hudson and Kaplan 1994; Tachida 2000; Gordo *et al.* 2002; Williamson and Orive 2002; O’Fallon *et al.* 2010; Nicolaisen and Desai 2012; Walczak *et al.* 2012; Good *et al.* 2014). The reason for this is simple: even strong selection cannot purge deleterious alleles instantly. Instead, deleterious haplotypes persist in the population on short timescales, allowing neutral variants that arise on their backgrounds to reach modest frequencies. This is most readily apparent in statistics based on the site frequency spectrum [the number, of polymorphisms which are at frequency *f* in the population], such as the number of singletons or Tajima’s D (Tajima 1989). As we show below, even when deleterious mutations have a strong effect on fitness, the site frequency spectrum shows an enormous excess of rare variants compared to the expectation for a neutral population of reduced effective size.

These signatures in genetic diversity are qualitatively similar to those we expect from population expansions and positive selection (Slatkin and Hudson 1991; Sawyer and Hartl 1992; Rannala 1997; Keinan and Clark 2012). A detailed quantitative understanding of background selection is therefore essential if we are to disentangle its signatures from those of other evolutionary processes.

The traditional approach to analyzing the effects of purifying selection has been to use backward-time approaches based on the structured coalescent (Hudson and Kaplan 1988, 1994). This offers an approximate framework to model how background selection affects the statistics of genealogical histories of a sample, and hence the expected patterns of genetic diversity. The approximations underlying this method are valid when selection is sufficiently strong that deleterious mutations rarely fix (Neher and Shraiman 2012), the same regime we will consider in this work. However, while these backward-time structured coalescent methods make it possible to rapidly simulate genealogies, they are essentially numerical methods and do not lead to analytical predictions. Furthermore, they give limited intuition as to the conditions under which their approximations are valid. A more technical but crucial limitation is that they rapidly become very computationally demanding in larger samples. This is becoming an increasingly important problem as advances in sequencing technology now make it possible to study sample sizes of thousands (or even hundreds of thousands) of individuals. The poor scaling of coalescent methods with sample size is of particular importance in studying background selection: since purifying selection is expected to result in an excess of rare variants, its effects increase in magnitude as sample size increases. This can reveal deviations from neutrality in large samples that are not seen in smaller samples.

Here, we use an alternative, forward-time approach to analyze how purifying selection affects patterns of genetic variation at a nonrecombining genomic segment. Our method is based on the observation that to predict single-locus statistics, such as the site frequency spectrum, it is not necessary to model the entire genealogy. Instead, we model the frequency of the lineage descended from a single mutation as it changes over time due to the combined forces of selection and genetic drift, and as it accumulates additional deleterious mutations. We then use these allele frequency trajectories to predict the site frequency spectrum, from which any other single-site statistic of interest can then be calculated (note, however, that multi-site statistics such as linkage disequilibrium or correlations between allele frequencies at different sites cannot be calculated from the site frequency spectrum).

We show that background selection creates large distortions in the frequency spectrum at linked neutral sites whenever there is significant fitness variation in the population. These distortions are concentrated in the high- and low-frequency ends of the frequency spectrum, and hence are particularly important in large samples. We provide analytical expressions for the frequencies at which these distortions occur and we can therefore predict at what sample sizes they can be seen in data.

Aside from single time-point statistics such as the site frequency spectrum, we also obtain analytical forms for the statistics of allele frequency trajectories. These trajectories have a very nonneutral character which reflects the underlying linked selection. Our approach offers an intuitive explanation for how these nonneutral behaviors arise in the presence of substantial linked fitness variation, which explains the origins of the distortions in the site frequency spectrum.

The statistics of allele frequency trajectories can also be used to calculate any time-dependent, single-site statistic. For example, we analyze how the future trajectory of a mutation can be predicted from the frequency at which we initially observe it, and we discuss the extent to which the observed frequency of a polymorphism can inform us about the fitness of the background on which it arose.

We emphasize that we focus throughout on modeling a perfectly linked genomic region. In the presence of recombination, our results offer insights about the effects of linked selection on diversity within regions that are effectively fully linked on the relevant timescales. In the *Discussion*, we discuss how our results can be used to provide a lower bound on the length of these segments, and therefore on the amount of linked selection relevant in sexually reproducing populations, and we comment on possible future extensions of our analysis to include recombination explicitly.

We begin in the next section by providing an intuitive explanation for the origins of the distortions in the site frequency spectrum in the presence of strong background selection, and explain why these distortions always accompany a reduction in diversity. This section summarizes the importance of correctly accounting for background selection, particularly when analyzing large samples, and should be accessible to all readers. We next define a specific model of background selection and summarize our main quantitative results.

We then present the analysis of our model. We begin by reviewing how dynamical aspects of allele frequency trajectories can be related to site frequency spectra, using the trajectories of isolated loci as an example. Readers already familiar with this intuition may choose to skip ahead, but those less interested in the technical details may find that this section provides useful intuition for the calculations in a simpler context. We then explain how this approach must be modified to account for linkage between multiple selected sites and present an intuitive description of the key features of allele frequency trajectories. These sections may be of interest to readers who wish to understand the intuitive origins of nonneutral behaviors of alleles in the presence of strong background selection. Finally, in the *Analysis*, we turn to a formal stochastic treatment of the trajectories of neutral and deleterious mutations. In the last section, we use these trajectories to calculate the site frequency spectrum and other statistics describing genetic diversity within the population.

## Strong Background Selection Distorts the Site Frequency Spectrum

We begin by presenting a more detailed description of the effects of background selection on linked neutral alleles. We focus on analyzing the allele frequency spectrum, defined as the expected number, of mutations that are present at frequency *f* within the population in steady state. This allele frequency spectrum contains all relevant information about single-site statistics: any such statistic of interest can be calculated by subsampling appropriately from

In Figure 1A, we show an example of the site frequency spectrum of neutral mutations at a locus experiencing strong background selection, generated by Wright–Fisher forward-time simulations. This example shows several key generic features of background selection. First, at intermediate frequencies the site frequency spectrum has a neutral shape, with the total number of such intermediate-frequency polymorphisms consistent with the simple reduced “effective population size” prediction (Charlesworth *et al.* 1993). However, at both low and high frequencies, is significantly distorted. At low frequencies, we see an enormous excess of rare alleles, qualitatively similar to what we expect in expanding populations (Slatkin and Hudson 1991; Rannala 1997). We also see a large excess of very high frequency variants, leading to a nonmonotonic site frequency spectrum. This is reminiscent of the nonmonotonicity seen in the presence of positive selection (Sawyer and Hartl 1992). Notably, these distortions at both high and low frequencies arise in populations of constant size in which all variation is either neutral or deleterious.

The excess of rare derived alleles arises because selection takes a finite amount of time to purge deleterious genotypes. Thus we expect that there can be substantial neutral variation linked to deleterious alleles that, although doomed to be eventually purged from the population, can still reach modest frequencies. At the very lowest frequencies, we expect that neutral mutations arising in all individuals in the population (independent of the number of deleterious mutations they carry) can contribute. Thus, at the lowest frequencies, the site frequency spectrum should be unaffected by selection and should agree with the neutral site frequency spectrum of a population of size *N*. On the other hand, as argued above, the total number of common alleles must reflect the (much smaller) number of deleterious-mutation-free individuals, because only neutral mutations arising in such individuals can reach such high frequencies. Since the overall number of very rare alleles is proportional to the census population size *N*, and the number of common alleles reflects a much smaller deleterious-mutation-free subpopulation, there must be a transition between these two: between these extremes the site frequency spectrum must fall off more rapidly than the neutral prediction This transition reflects the fact that as frequency increases, the effect of selection will be more strongly felt, and neutral mutations arising in genotypes of increasingly lower fitnesses will become increasingly unlikely.

As the frequency increases even further, we see from our simulations that the total number of polymorphisms increases again until, at very high frequencies, it matches the prediction for a neutral population of size equal to the census size *N*. Note that, at these frequencies, the total number of backgrounds contributing to the diversity is constant (*i.e.*, all mutations reaching these frequencies must arise in the small subpopulation of mutation-free individuals). This suggests that fundamentally nonneutral behaviors must be dominating the dynamics of these high frequency neutral polymorphisms. To understand this, as well as the details of the rapid falloff at very low frequencies, we will need to develop a more detailed description of the trajectories of neutral alleles in the population; we analyze this in quantitative detail in a later section.

However, a simple argument can explain the agreement with the neutral prediction at the highest frequencies. Polymorphisms observed at these very high frequencies correspond to neutral variants that have almost reached fixation. The ancestral allele is still present in the population, but at a very low frequency. In principle, the dynamics of the derived and ancestral alleles should depend on the fitnesses of their backgrounds. However, once the frequency of the ancestral allele is sufficiently low, the effects of drift will once again dominate over the effects of selection. Thus, at extremely high frequencies of the derived allele, its dynamics must become neutral. In addition to having neutral dynamics, the overall rate at which neutral mutations enter this high-frequency regime also agrees with the rate in a neutral population at the census population size. This is because, at steady state, the total rate at which neutral mutations fix is equal to the product of the rate at which they enter the population at any point in time () and their fixation probability, (Birky and Walsh 1988). Thus, since the total rate at which alleles enter this high-frequency regime is unaffected by selection, and since their dynamics within this regime are neutral, we expect that the site frequency spectrum should also agree with the neutral prediction for a population of size *N*.

Although these simple arguments do not provide a full quantitative explanation of the site frequency spectrum, they already offer some intuition about the presence and magnitude of the distortions due to background selection. First, these distortions arise in part as a result of the difference in the number of backgrounds on which mutations that remain at the lowest frequencies and mutations that reach substantial frequencies can arise. Thus, they will always occur when background selection is strong enough to cause a substantial reduction in the effective population size: if the pairwise diversity π is at all reduced compared to the neutral expectation [ or, in terms of McVicker’s *B* statistic, (McVicker *et al.* 2009)], these distortions exist (see Figure 1A). Second, because the distortions from the neutral shape are limited to high and low ends of the frequency spectrum, they will have limited effect on site frequency spectra of small samples, but will have dramatic consequences as the sample size increases (see Figure 1, B and C). On a practical level, this means that extrapolating conclusions from small samples about the effects of background selection can be grossly misleading.

### Data availability

Code used to generate the simulated data are available at: https://github.com/icvijovic/background-selection. Supplemental material available at Figshare: https://doi.org/10.25386/genetics.6167591.

## Model and Results

In the next few sections, we will analyze the dynamics of neutral mutations under background selection in detail. We focus on the simplest possible model of purifying selection at a perfectly linked genetic locus in a population of *N* individuals. We assume neutral mutations occur at a per-locus, per-generation rate and deleterious mutations occur at rate (). Throughout the bulk of the analysis, we will assume that all deleterious mutations reduce the (log) fitness of the individual by the same amount *s*, although we analyze the effects of relaxing this assumption in a later section. We assume that since this is the interesting case for biologically relevant mutation rates, although we also consider the effects of more strongly deleterious (or lethal) mutations in the *Discussion*. We neglect epistasis throughout, so that the fitness of an individual with *k* deleterious mutations at this locus is For simplicity we consider haploid individuals, but our analysis also applies to diploids in the case of semidominance (). We assume that selection is sufficiently strong that alleles carrying deleterious mutations cannot fix in the population (). The opposite case, in which deleterious mutations are weak enough to routinely fix (), has been the subject of earlier work (Good and Desai 2013; Neher and Hallatschek 2013; Good *et al.* 2014). In the *Discussion*, we comment on the connection between these earlier weak-selection results and the strong-selection case we study here.

Our model is equivalent to the nonepistatic case of the model formulated by Kimura and Maruyama (1966) and Haigh (1978) as well as to the case of the model considered by Charlesworth *et al.* (1993) and Hudson and Kaplan (1994), and later studied by many other authors (Gordo *et al.* 2002; Seger *et al.* 2010; Nicolaisen and Desai 2012; Walczak *et al.* 2012). However, instead of modeling the genealogies of a sample of individuals from the population backwards in time, we offer a forward-time analysis of this model in which we analyze the full frequency trajectory of alleles.

In the presence of strongly selected deleterious mutations (), we find that the magnitude of the effects of background selection critically depends on the ratio, λ, of the deleterious mutation rate, to the selective cost of each deleterious mutation, *s*: (Figure 2). This ratio controls the overall variance in the number of deleterious mutations carried by individuals in the population, which is equal to (Kimura and Maruyama 1966). Whenever both the overall genetic diversity and the full neutral site frequency spectrum are unaffected by background selection and the site frequency spectrum is to leading order equal to (1)This prediction agrees with the results of forward-time simulations (see Figure 2). The intuition behind this result is simple: in the limit that a majority of individuals in the population are free of deleterious mutations; neutral alleles are therefore rarely linked to deleterious mutations. This results in a neutral site frequency spectrum.

However, we will show that the site frequency spectrum of neutral mutations follows a very different form when (2)where represents the standard deviation in fitness in the population, and line 2 in Equation 2 is valid up to a constant factor (see *Contribution from the Peaks of Trajectories* in Appendix I for details). Comparisons between Equation 2 and simulations of the model are shown in Figure 2. We note that matches the site frequency spectrum of a neutral population with a smaller effective population size for but deviates strongly outside this frequency range. This implies that summary statistics based on the site frequency spectrum (*e.g.*, the average minor allele frequency) will start to deviate from the neutral expectation in samples larger than individuals, but not in smaller samples (Figure 1, B and C).

Our results also offer an intuitive interpretation of the origins of these distortions, which are summarized in Figure 3. When a large majority of individuals in the population will carry some deleterious mutations at the locus, which results in substantial fitness variation within the population. However, the majority of neutral alleles are present on backgrounds that are within of the mean of the distribution. Thus, at frequencies and the effects of genetic drift dominate over any effects of linked selection for the majority of neutral alleles. At these frequencies, the site frequency spectrum agrees with that of a neutral population of size *N* (see Figure 3).

In contrast, the effects of linked selection have a crucial impact on allele frequency trajectories at frequencies *f* for which As we show in a later section, this region of the site frequency spectrum is dominated by alleles that arise on unusually fit backgrounds [with fitness with respect to the mean larger than ]. For these alleles, a crucial distinction arises between their short-term and long-term behavior: although genotypes that carry *any* polymorphic strongly deleterious variants are guaranteed to be eventually purged from the population, those that contain *fewer than average* deleterious mutations are still positively selected on shorter timescales. This results in strong nonneutral features in the frequency trajectories of these alleles. Their trajectories are characterized by rapid initial expansions, followed by a peak, and eventual exponential decline (Figure 4). These deterministic aspects of allele frequency trajectories are similar to those seen by Neher and Shraiman (2011) in models of linked selection in large facultatively sexual populations. We describe them in detail in the section titled *Key features of lineage trajectories*. A part of the rapid falloff in the site frequency spectrum between and results from these deterministic effects: alleles arising on backgrounds with more deleterious variants can reach more limited frequencies than alleles arising on backgrounds with fewer deleterious variants. Thus, the number of backgrounds on which neutral alleles could have arisen declines with the frequency, leading to a falloff of the site frequency spectrum.

However, these deterministic aspects of the allele frequency trajectory are not sufficient to produce the site frequency spectrum in Equation 2, even if stochastic effects in the early phase of the trajectory are taken into account (*i.e.*, during “establishment”; see Desai and Fisher 2007 and Neher and Shraiman 2011). This is because fluctuations in the numbers of most-fit individuals that occur after establishment continue to drive fluctuations in the overall allele frequency. This is closely related to the fluctuations in the population fitness distribution studied by Neher and Shraiman (2012) in an analysis of Muller’s ratchet.

In the *Analysis*, we quantify how these fluctuations propagate to shape the statistics of allele frequency trajectories, finding that fluctuations in the number of most-fit individuals that happen on a timescale shorter than are smoothed out due to the finite timescale on which selection can respond. In contrast, fluctuations that happen on timescales longer than are faithfully reproduced in the allele frequency trajectory, which leads to quasi-neutral statistics of allele frequency trajectories at frequencies between and (see Figure 3). The smoothing of fluctuations on a finite timescale introduces an additional fundamentally nonneutral feature in the total allele frequency trajectory. This distorts the site frequency spectrum at frequencies below above and beyond what would be predicted if we asserted a simple frequency-dependent effective population size equal to the number of backgrounds that can contribute to a given frequency.

Finally, we will demonstrate that the nonmonotonicity in the site frequency spectrum at frequencies between and arises as a result of sweep-like behaviors of neutral alleles that have fixed among the most-fit individuals in the population (see Figure 3). Because these derived alleles carry, on average, fewer deleterious mutations than the wild type, they are positively selected despite having no inherent benefit. We will show that this difference in the average number of linked deleterious mutations gives rise to an effective frequency-dependent selection coefficient This selection coefficient changes with the frequency *f* of the mutation as high-fitness, wild-type individuals ratchet to extinction: (3)In the next sections, we derive the form of the site frequency spectrum in Equation 2 and explain these effects in more detail. We begin by presenting background necessary for understanding these results. We first revisit the intuition behind the shape of the site frequency spectra of isolated loci (Ewens 1963; Sawyer and Hartl 1992). We show that, in the absence of linkage between multiple selected sites, background selection does not lead to a site frequency spectrum of the form in Equation 2. Next, we explain how linkage between multiple selected sites modifies allele frequency trajectories. We revisit the key deterministic aspects of allele frequency trajectories in the presence of background selection, previously studied by Etheridge *et al.* (2009) and others, and extend these results to identify the key timescales important for understanding this problem. Finally, we turn to a full stochastic treatment of allele frequency trajectories in the *Analysis*, where we also derive the expressions for the site frequency spectra of neutral and deleterious mutations. In the *Discussion*, we comment on the practical implications of our results, as well as on connections to previous work and other models.

## Background

### Isolated loci

To gain insight into the more complicated case of linked selection, we first begin by reviewing the simplest case of a single locus isolated from any other selected loci. The probability that an allele at that locus is present at frequency *f* at time *t*, is described by the diffusion equation: (4)Ewens (1963) showed that the expected site frequency spectrum can be obtained from this forward-time description of the allele frequency trajectory: because mutations are arising uniformly in time and the time at which a mutation is observed is random, the site frequency spectrum is proportional to the average time an allele is expected to spend in a given frequency window.

In this section, we show that the low- and high-frequency ends of the site frequency spectrum of isolated loci can be obtained from a simple heuristic argument that emphasizes this connection between allele frequency trajectories and the site frequency spectrum. These calculations are not intended to be exact [resulting frequency spectra are only valid up to factors], but they provide intuition for the origins of key features of the site frequency spectrum that we will return to more formally below.

Consider the simplest case of isolated, purely neutral loci. Neutral mutations will arise in the population at rate In the absence of selection, the trajectories of these mutations are governed by genetic drift. At steady state, the number of mutations we expect to see at frequency *f* is simply proportional to the number of mutations that reach that frequency and the typical time each of these mutations spends at that frequency before fixing or going extinct. In the absence of selection, a new mutation that arises at initial frequency will reach frequency *f* before going extinct with probability Standard branching process calculations (Fisher 2007) show that, given that it reaches frequency *f*, the mutation will spend about generations around that frequency [defined as not changing by more than ], provided that *f* is small ().

By combining these results, we can calculate the expected site frequency spectrum for small *f*. The rate at which new mutations reach frequency *f* is Those that do will remain around *f* (in the sense defined above) for about generations. Thus the total number of neutral mutations within of frequency *f* is In other words, we have (5)This argument is valid when *f* is rare, but will start to break down at intermediate frequencies. However, because the wild type is rare when the mutant approaches fixation, an analogous argument can be used to describe the site frequency spectrum at high frequencies. The mutant trajectory still reaches frequency *f* with probability It will then spend roughly generations around this frequency [*i.e.*, within of ]. This gives in the high-frequency end of the spectrum. This simple forward-time heuristic argument reproduces a well-known result of coalescent theory (Wakeley 2009) and agrees with the more formal calculation of sojourn times in the Wright–Fisher process (Ewens 1963).

We can use a similar argument to calculate the frequency spectrum of strongly selected deleterious mutations with fitness effect (with ) that occur at a locus that is isolated from any other selected locus. Provided that the deleterious mutation is rare (below the “drift barrier” frequency, ), its trajectory is dominated by drift. Thus for the mutation trajectory will be the same as for a neutral mutation and the frequency spectrum will therefore be neutral. In contrast, at frequencies larger than selection is stronger than drift, which prevents the mutation from exceeding this frequency. Combining these two expressions, we find that the frequency spectrum of an isolated deleterious mutation is, to a rough approximation, given by (6)For completeness, we also show how a similar argument can be used to obtain the frequency spectrum of beneficial mutations. Although it is not immediately obvious that this is relevant to background selection, we will later see how similar trajectories emerge in the case of strong purifying selection. Just like deleterious alleles, strongly beneficial alleles with fitness effect *s* (with ) will not feel the effects of selection as long as they do not exceed the drift barrier (). Their trajectory and frequency spectrum will therefore be neutral below the drift barrier. As a result, only a small fraction *s* of beneficial mutations will reach frequency However, those that do will be destined to fix since, at frequencies larger than selection dominates over drift. Above this threshold, selection will cause the frequency of the mutation to grow logistically at rate *s* [], spending generations near frequency *f*. This is valid as long as at which point the effects of drift become dominant due to the wild type being rare, and the trajectory of the mutant is once again the same as the trajectory of a neutral mutation. Combining these expressions, we obtain a rough approximation for the frequency spectrum of an isolated beneficial mutation:

### Linked loci under background selection

We now turn to the analysis of background selection. Since we assume that all mutations have the same effect on fitness, the population can be partitioned into discrete fitness classes according to the number of deleterious mutations each individual carries at the locus. When the fitness effect of each mutation is sufficiently strong, the population assumes a steady-state fitness distribution in which the expected fraction of individuals with *k* deleterious mutations, follows a Poisson distribution with mean (Kimura and Maruyama 1966; Haigh 1978): (8)A new allele in such a population will arise on a background with *k* existing mutations with probability

From the form of we see that, depending on the value of λ, the population can be in one of two regimes. In the first regime, the rate at which mutations are generated is smaller than the rate at which selection can purge them (). In this case, the majority of individuals in the population carry no deleterious mutations (), with only a small proportion, of backgrounds in the population carrying some deleterious variants. To leading order in λ, all new neutral mutations will arise in a mutation-free background and will remain at the same fitness as the founding genotype. Their trajectories are thus the same as the trajectories of mutants at isolated genetic loci of the same fitness as the founding genotype (see Appendix D for details). This means that the full site frequency spectrum can be calculated by summing the contributions of site frequency spectra of isolated loci that we calculated above. The neutral and deleterious site frequency spectra are, to leading order in λ, given by Equations 5 and 6, respectively (see Appendix H for details). Thus, background selection has a negligible impact on mutational trajectories and diversity when

In the opposite regime where mutations are generated faster than selection can purge them and there will be substantial fitness variation at the locus. Consider a new allele (*i.e.*, a new mutation at some site within the locus) that arises in this population. A short time after arising, individuals that carry this allele will accumulate newer deleterious mutations, which will lead the allele to spread through the fitness distribution. The fundamental difficulty in calculating the frequency trajectory of this allele, stems from the fact that a short time after arising, individuals that carry the allele will have accumulated different numbers of newer deleterious mutations. The total strength of selection against the allele depends on the average number of deleterious mutations that the individuals that carry the allele have. This will change over time in a complicated stochastic way as the lineage purges old deleterious mutations, accumulates new ones, and changes in frequency due to drift and selection. To calculate the distribution of allele frequency trajectories in this regime, we will need to model these changes in the fitness distribution of individuals carrying the allele. Although we will formally be treating λ as a large parameter, in practice our results will also adequately describe allele frequency trajectories in the cases of moderate λ (*i.e.*, see Figure 2).

To make progress, we classify individuals carrying this allele (the “labeled lineage”) according to the number of deleterious mutants they have at the locus. We denote the total frequency of the labeled individuals that have *i* deleterious mutations as so that the total frequency of the lineage, is given by (9)The time evolution of the allele frequency in a Wright–Fisher process is commonly described by a diffusion equation for the probability density of the allele frequency (Ewens 2004). Instead, for our purposes, it will be more convenient to consider the equivalent Langevin equation (Van Kampen 2007): (10)Here, is a noise term with a complicated correlation structure that is necessary to keep the total size of the population fixed (see Good and Desai 2013 for details), and is the mean number of mutations per individual in the entire population at time *t*. In the strong selection limit that we are interested in here (), fluctuations in the mean of the fitness distribution of the population are small and (Neher and Shraiman 2012).

### Key features of lineage trajectories

Before turning to a detailed analysis of Equation 10, it is helpful to consider some of the key features of lineage trajectories that we will model more formally below. To begin, imagine a lineage founded by a neutral mutation in an individual with *k* deleterious mutations. Let the lineage comprise individuals at some time shortly after arising, all of which carry *k* deleterious mutations (see blue inset in Figure 4A). At this time, the relative fitness of this lineage is simply Thus, lineages founded in classes with will tend to decline in size. In contrast, the more interesting case arises if since these lineages will tend to increase in size.

However, although the overall number of individuals that carry the allele will tend to increase when the part of the lineage in the founding class *k* (the “founding genotype”) will tend to decline in size because it loses individuals through new deleterious mutations (at per-individual rate ). As a result, the founding genotype feels an effective selection pressure of which is negative for all and 0 for This means that the lineage will increase in frequency, not through an increase in size of the founding genotype, but rather through the appearance of a large number of deleterious descendants in classes of lower fitness. The lineage must therefore decline in fitness as it increases in size.

In the absence of genetic drift, we can calculate how the size and fitness of the lineage change in time by dropping the stochastic terms in Equation 10 [subject to the initial condition and for all ]. These deterministic dynamics of the lineage have been analyzed previously by Etheridge *et al.* (2009), who showed that the number of additional mutations that an individual in the lineage carries at some later time *t* is Poisson distributed with mean Thus the average number of additional deleterious mutations eventually approaches λ after generations. At this point, the lineage has reached its own mutation–selection balance: the fitness distribution of the lineage has the same shape as the distribution of the population [*i.e.*, ] but is shifted by compared to the distribution of the population (see red inset in Figure 4A).

The average relative fitness of individuals in the lineage (Figure 4A) is therefore equal to (11)and the total number of individuals in the lineage is simply where we have defined (12)Thus, we can see from Equations 11 and 12 that lineages founded in the 0-class will, on average, steadily increase in size at a declining rate until they asymptote at a total size equal to roughly generations later (see Figure 4B). In contrast, lineages founded in the *k*-class will increase in size for only (13)generations, when they peak at a size of individuals (see Figure 4C), where we have defined (14)The lineages remain near this peak size for about (15)generations (Figure 4C). At longer times, they exponentially decline at rate (Figure 4C).

These simple deterministic calculations capture the average behavior of an allele and show that all alleles founded in classes with are likely to be extinct on timescales much longer than whereas sufficiently large lineages founded in the 0-class should simply reflect the frequency in the founding class about generations earlier: This is the forward-time analog of the intuition presented by Charlesworth *et al.* (1993).

Of course, this deterministic solution neglects the effects of genetic drift, which will be crucial, particularly because drift in each class propagates to affect the frequency of the lineage in all lower fitness classes (for a more detailed heuristic describing why drift can never be ignored, see *The Importance of Genetic Drift in the Founding Class* in Appendix B). Although these effects are complex, there is a hierarchy in the fluctuation terms which we can exploit to gain some intuition. From the deterministic solution above, we can see that a fluctuation of size in class *i* will, on average, eventually cause a change in the total size of the lineage proportional to after a time delay Thus, the fluctuations that have the largest effect on the total size of the lineage are those that occur in the class of highest fitness (*i.e.*, the founding class *k*). These fluctuations will turn out to be the most important in describing the frequency trajectory of the entire allele, although fluctuations in classes of lower fitness will still matter in lineages of a small enough size.

One could imagine that this result means that fluctuations in the total size of the lineage simply mirror the fluctuations in the founding class, amplified by a factor and after a time delay If fluctuations in the founding class are sufficiently slow, this is indeed the case. However, this is not true for fluctuations that occur on shorter timescales. Consider, for example, the case where a neutral mutation is founded in the mutation-free () class. Imagine that the frequency of the allele in the founding class changes by a small amount from to as a result of genetic drift (shown in the first panel of Figure 5). Based on the deterministic solution, this fluctuation will lead to a proportional change in the frequency of the portion of the lineage in the 1-class, and this change will take place over generations (see Appendix A for details). During this time, the change in the 1-class begins to lead to a shift in the frequency in the 2-class, which will mirror the change in the 0-class a further generations later (see Figure 5). This change will then propagate, in turn, to lower classes and ultimately results in a proportional change in the total allele frequency a total of generations later (see Figure 5).

Now consider what happens if there is another change in the frequency in the founding class. If this change occurs within the initial generations, it will influence the 1-class simultaneously with the first fluctuation, and thus the effect of these two fluctuations on the overall lineage frequency will be “smoothed” out. In contrast, if the changes are separated by more than generations, they will propagate sequentially through the fitness distribution and are ultimately mirrored in the total allele frequency. Similar arguments apply to lineages founded in other fitness classes, though the relevant timescales and scale of amplification are different.

Together, these arguments suggest that fluctuations in the founding class will have the largest impact on overall fluctuations in the lineage frequency, and these overall fluctuations will represent an amplified but smoothed-out mirror of the fluctuations in the founding class. This smoothing will be crucial: the size of the lineage in the founding class will typically fluctuate neutrally, but the smoothed-out and amplified versions will have nonneutral statistics. As we will see below, this smoothing ultimately leads to distortions in the site frequency spectrum at low frequencies ().

## Analysis

Formally, we analyze all of the effects described above by computing the distribution of the frequency trajectories of the allele, from Equation 10 for an allele arising in class *k*. This process is complicated by the correlation structure in the terms required to keep the population size constant. These correlations are important once the lineage reaches a high frequency and, in the presence of strong selection, they result in a complicated hierarchy of the moments of *f*, which do not close (Higgs and Woodcock 1995; Good and Desai 2013). However, we can simplify the problem by considering low-, high-, and intermediate-frequency lineages separately. First, at sufficiently low frequencies (), the in Equation 10 reduce to simple uncorrelated white noise. At these low frequencies, Equation 10 thus simplifies to (16)where the noise terms have and covariances and should be interpreted in the Itô sense. At very high frequencies (), a similar simplification arises. In this case, the wild-type lineage is at low frequency and we can model the wild-type frequency using an analogous coupled branching process with uncorrelated white noise terms. Finally, at intermediate frequencies, we cannot simplify the noise terms in this way. Fortunately, for the case of strong selection we consider here, we will show that for lineage trajectories have neutral statistics on relevant timescales. As we will see below, these low-, intermediate-, and high-frequency solutions can then be asymptotically matched, giving us allele frequency trajectories and site frequency spectra at all frequencies.

In the next several subsections, we focus on the analysis of the distribution of trajectories at low and high frequencies ( or ), where Equation 16 is valid. We then return in a later subsection to the analysis of trajectories at intermediate frequencies.

### The dynamics of the lineage within each fitness class

To obtain the distribution of trajectories of the allele at low frequencies () from Equation 16, we will first compute the generating function of This generating function is defined as (17)where angle brackets denote the expectation over the probability distribution of the frequency trajectory is simply the Laplace transform of the probability distribution of and it therefore contains all of the relevant information about the probability distribution of

As we have already anticipated from our discussion above, the time evolution of depends on the distribution of the lineage among different fitness classes. To understand how this distribution changes under the influence of drift, mutation, and selection in these classes, we can consider the joint generating function for the (18)The generating function for the total allele frequency can then be obtained from this joint generating function by setting We will use this relationship between the two generating functions to evaluate the importance of drift, mutation, and selection within each of the fitness classes on the total allele frequency.

By taking a time derivative of Equation 18 and substituting the time derivatives from Equation 16 (where the stochastic terms should be interpreted in the Itô sense, see Appendix C), we can obtain a partial differential equation (PDE) describing the evolution of the joint generating function: (19)We see from Equation 19 that the joint generating function is constant along the characteristics defined by (20)Thus, the joint generating function can be obtained by integrating along the characteristic backward in time from to subject to the boundary condition Note that the linear terms in the characteristic equations arise from selection and mutation out of the *i*-class and that the nonlinear term arises from drift in class *i*.

In *Large Lineages Arising on Unusually Fit Backgrounds* in Appendix E, we show that when considering the distribution of trajectories at frequencies the nonlinear terms in Equation 20 are of negligible magnitude uniformly in time in all classes containing *i* or more deleterious mutations per individual, as long as and Here, represents the peak of the expected number of individuals in a lineage founded by a single individual in class *i* (see Equation 14 and Figure 4C). Thus, when the effect of genetic drift is negligible in classes with *i* or more deleterious mutations. Conversely, when genetic drift in the class with *i* deleterious mutations does affect the overall allele frequency.

Since drift is negligible in classes with *i* or more mutations, total allele frequencies of require that This threshold is reminiscent of the drift barrier, but its origin for classes below the founding class () is more subtle. We offer an intuitive explanation for this threshold in *The Importance of Genetic Drift in Classes Below the Founding Class* in Appendix B. Thus, drift in class *i* has an important impact on the overall frequency trajectory as long as However, once exceeds the effect of genetic drift in that class, as well as in all classes below *i*, becomes negligible because the frequencies of the parts of the lineage in all classes below *i* are then also guaranteed to exceed the corresponding thresholds. Note that the frequency of the founding genotype is exponentially unlikely to substantially exceed This is because, as we explained earlier, the frequency trajectory of the founding genotype has the same statistics as the trajectory of a mutation of fitness at an isolated locus (see Equation 16 and Appendix F). Thus, because is unlikely to exceed the overall allele frequency *f* of an allele founded in class *k* is exponentially unlikely to substantially exceed .

In summary, by analyzing the generating function for the components of the lineage in different fitness classes, we have found that there is a clear separation between high-fitness classes in which mutation and drift are the primary forces, and classes of lower relative fitness in which mutation and selection dominate. The boundary between the stochastic and deterministic classes can be determined from the total allele frequency, allowing us to reduce a complicated problem involving a large number of coupled stochastic terms to what we will see is a small number of stochastic terms feeding an otherwise deterministic population.

### Statistics of trajectories with

At this point, we are in a position to calculate a piecewise form for the generating function valid near any frequency *f*. For example, consider the allele frequency trajectory in the vicinity of some frequency As we have explained above, at these frequencies contributions from mutations arising in class are exponentially small, since they would require the frequency of the lineage in that class to substantially exceed which happens only exponentially rarely. Thus, in this frequency range we will only see mutations arising in the mutation-free class (). In addition to this, we have shown that at these frequencies genetic drift can be neglected in all classes but the 0-class. To obtain the generating function at these frequencies, we can therefore integrate the characteristic equations by dropping the nonlinear terms in Equation 20 for all [see *Large Lineages Arising on Unusually Fit Backgrounds* in Appendix E for details]. This yields the generating function for the frequency of the labeled lineage: (21)where the average is taken over all possible realizations of the trajectory in the founding class

As before, represents the expected number of individuals descended from an individual present in the 1-class generations earlier (see Equation 12). Thus, the two terms in the exponent in Equation 21 represent the frequency of the lineage in the founding class and the total frequency of the deleterious descendants of that lineage. The latter are seeded into the 1-class at rate and each of these deleterious descendants founds a lineage that generations later contains individuals, so that the total frequency of the allele is simply (22)Thus, we have obtained a simple expression for the frequency of the entire allele in which all of the stochastic effects have been reduced to a single stochastic component, Furthermore, the stochastic dynamics of are those of a simple, isolated, neutral mutation (see schematic of such a trajectory in Figure 6B). Note, however, that the statistics of the fluctuations in are not necessarily the same as the statistics of the trajectory in the founding class (see Figure 6A). This is because depends on an *integral* of (see Equation 22) and therefore has different stochastic properties than itself.

From Equation 22, we can see that the frequency trajectory of the allele still has the same qualitative features as those we have seen in the deterministic behavior of mutations. Shortly after being founded, the lineage will become dominated by the deleterious descendants of the founding class, which are captured by the second term in Equation 22 (see left inset in Figure 6A). At early times [], the total allele frequency must rapidly grow as the lineage spreads through the fitness distribution and approaches mutation–selection balance (see Figure 6A). About generations after founding, the peak phase of the trajectory begins (see Figure 6A). During this phase, the average fitness of the lineage is approximately zero and the allele traces out a smoothed-out and amplified version of the trajectory in the founding class (Figure 6B). Finally, generations after the descendants of the last individuals present in the founding class have peaked, the average fitness of the lineage will fall significantly below zero and the extinction phase of the trajectory begins.

As we show in Appendix I, the peak phase of the trajectory is the most important for understanding the site frequency spectrum. This is also the phase during which the trajectory of the mutation spends the longest time near a given frequency. In contrast, the spreading phase (see Figure 6A) has a negligible effect on the site frequency spectrum: by this we mean that the site frequency spectrum at a given frequency will always be dominated by the peak phase of trajectories that peak around that frequency, and will not be influenced by the spreading phase of trajectories that peak at much higher frequencies. We will therefore not consider the spreading phase in the main text, but discuss it in *Contribution from the Spreading Stage of Trajectories* in Appendix I. The extinction phase of the trajectory can also be neglected for a similar reason, except when considering the very highest frequencies: (see *Contribution from the Extinction Stage of Trajectories* in Appendix I). At these frequencies, the wild-type frequency is small and the mutant is in the process of fixation. To analyze the allele frequency trajectory at these frequencies, we model the wild type using the coupled branching process in Equation 16 and hence describe these trajectories by the extinction phase of the wild type.

To calculate the distribution of in the peak phase, we need to calculate the distribution of the time integral of in Equation 22. We can simplify this integral by observing that is highly peaked in time between and where and are given by Equations 13 and 15 and are annotated in Figure 4C. In other words, starting at times around generations after the lineage reaches a substantial frequency in the founding class, the labeled lineage is dominated by the deleterious descendants of individuals extant in the founding class between and generations earlier, with individuals extant in the founding class at other times having exponentially smaller contributions [see *Large Lineages Arising on Unusually Fit Backgrounds* in Appendix E for details]. Thus, the total size of the lineage will be proportional not to the frequency in the founding class generations earlier, but to the total time-integrated frequency within some window of width centered around that time. We call this quantity the “weight” and denote it by where (23)The total allele frequency in the peak phase is therefore equal to (24)Thus, to calculate the distribution of the allele trajectory, we only need to calculate the distribution of the weight in the founding class over a window of specified width, It is informative to consider the time-integrated form of the distribution of this weight, since this form is also directly relevant to the site frequency spectrum [for a discussion of the time-dependent distribution see Appendix F]. In Appendix F we show that is given by (25)This distribution has a form that can be simply understood in terms of the trajectory in the founding class. Since genetic drift takes order generations to change substantially, drift will not change significantly within generations when the frequency in the founding class exceeds As a result, the weight, will be approximately equal to Therefore, at these large frequencies, the weight simply traces the founding class frequency and the two quantities have the same distributions. At lower frequencies, the founding genotype will typically have arisen and gone extinct in a time of order generations (where is the maximal frequency the lineage reaches over the course of its lifetime). By assumption, this time is much shorter than Thus, the weight in a window of width that contains this trajectory is simply This large a trajectory is obtained with probability from which it follows (by a change of variable) that the distribution of weights in the founding class scales as

As we anticipated in our discussion of the propagation of fluctuations of the founding genotype through the fitness distribution (Figure 5), we have found that the trajectory of the allele in the peak phase looks like a smoothed-out, time-delayed, and amplified version of the trajectory in the founding class (Figure 6). At sufficiently high frequencies, the timescale of the smoothing is shorter than the typical timescale of the fluctuations in the founding class. At these frequencies, the statistics of the fluctuations of the allele simply mirror the statistics of the fluctuations in the founding class, with a time delay equal to

At lower frequencies, the timescale of smoothing is much longer than the typical lifetime of the founding genotype. As a result, the deleterious descendants of the entire original genotype rise and fall simultaneously and fluctuations in the founding class are not reproduced in detail. Instead, the peak phase of the allele frequency trajectory consists of a single peak with size proportional to the total lifetime weight of the founding genotype, As we calculated above, the distribution of these peak sizes falls off more rapidly than neutrally. This gives us a complete description of the statistics of the peaks of allele frequency trajectories in the frequency range

### Statistics of trajectories with

So far, we have only considered trajectories of lineages that reach a maximal allele frequency larger than all of which must have arisen in the mutation-free class. At lower frequencies, the effects of genetic drift in class must also considered, but the behavior in classes with is deterministic. In this case, by repeating our earlier procedure, we obtain a slightly different form for the generating function (26)so that the total allele frequency is (27)The total allele frequency is once again dominated by the last term, which represents the bulk of the deleterious descendants. Thus, by an analogous argument, the peak size of the lineage is proportional to the weight in the 1-class in a window of width (28)There are two types of trajectories that can reach these frequencies: trajectories that arise in the 1-class and reach a sufficiently large frequency in their founding class ( see Appendix G); and trajectories that arise in the 0-class and reach a smaller frequency in their founding class (), but still leave behind enough deleterious descendants in the 1-class that the overall frequency in that class exceeds By the argument that we outlined before, this ensures that genetic drift will negligible in classes of lower fitness (*i.e.*, for ) and is guaranteed to happen if (see Appendix G).

The trajectories of the former type are simple to understand since, in this case, the trajectory is that of a simple, isolated, deleterious locus with fitness [and at all times]. By repeating the same procedure as above, we find that the time-integrated distribution of the weights in the 1-class is (29)Note that since the trajectory of a mutation in the founding 1-class is longer than generations only exponentially rarely, a window of length nearly always contains the entire founding class trajectory (see Appendix F). This is reflected in the form of the weight distribution in Equation 29, which falls as with an exponential cutoff at Thus, the frequency trajectory of an allele that arises in the 1-class will not mirror the fluctuations in the founding genotype. Instead, the peak phase of the allele frequency trajectory will nearly always consist of a single peak, just as we have seen in the case of alleles peaking at frequencies

We now return to the other type of trajectory that can peak in this range: alleles arising in the 0-class, but reaching a small enough frequency that the effects of genetic drift in the 1-class cannot be ignored (). Because the trajectory of these alleles in the 1-class represents the combined trajectory of multiple clonal “sublineages,” each founded by a mutational event in the 0-class, the distribution of weights in the 1-class will be different [ see Appendix G], which leads to a different distribution of overall allele frequencies *f*. However, as we show in *Contribution from the Peaks of Trajectories* in Appendix I, these trajectories have a negligible impact on the site frequency spectrum: because the overall number of mutations arising in class 1 is substantially larger than the overall number of mutations arising in class 0, trajectories that arise in class 0 and peak in the same frequency range as mutations originating in class 1 are less frequent by a large factor ( see Appendix G).

Similarly, at even lower frequencies in the range we will see the peaks of trajectories arising on backgrounds with *i* or fewer deleterious mutations. These trajectories all have a single peak of width equal to The maximal peak sizes are, once again, proportional to the total weight in the *i*-class, which will be distributed according to a different power law depending on the difference in the number of deleterious mutations between the founding class *k* and the *i*-class (see Appendix G for details). As we show in *Contribution from the Peaks of Trajectories* in Appendix I, the most numerous of these mutations are those that arise in the *i*-class (). The index of this most-numerous class is a quantity that we return to at multiple points and we denote it with We can obtain an explicit form for how depends on the frequency *f* by solving the implicit condition for We show in *Contribution from the Peaks of Trajectories* in Appendix I that, to leading order, (30)Finally, at the very lowest frequencies, the site frequency spectrum is dominated by the trajectories of lineages that arise in a class that is within an standard deviation σ of the mean of the fitness distribution (*i.e.*, lineages with ). Unlike the trajectories of lineages that arise in classes of higher fitness that we discussed above, allele frequency trajectories of lineages arising within an standard deviation of the mean are typically dominated by drift throughout their lifetime [see *Lineages Arising on Typical Backgrounds* in Appendix E]. This is because the timescale on which these lineages remain above the mean of the fitness distribution [which is limited by ] is shorter than the timescale that it takes them to drift to a frequency large enough for the effect of selection to be felt []. Lineages arising in these classes do not reach frequencies substantially larger than and have largely neutral trajectories at frequencies that remain below this threshold.

### The mirrored fluctuations of the allele at intermediate frequencies,

We have seen that the effects of genetic drift in multiple fitness classes may be important when but that, at frequencies larger than genetic drift in all classes apart from the 0-class can be neglected. At these frequencies, the trajectory of the allele mirrors the fluctuations in the 0-class that occur on timescales longer than generations. We have also seen that overall allele frequencies larger than correspond to 0-class frequencies of

At more substantial allele frequencies (for which the condition that is not satisfied), the coupled branching process in Equation 16 cannot be used to adequately model the allele frequency trajectory. This is because, at these frequencies, the correlations between fluctuations in the frequencies of the mutant and of the wild type, which are imposed by the finite-size constraint of the population, become important. However, we can account for these correlations simply by making use of the fact that the effect of genetic drift in all classes but the 0-class will remain negligible as long as both the mutant and the wild type remain at sufficiently large frequencies. Thus, to model the overall allele frequency trajectory at these intermediate frequencies, we can use a simple, neutral model to describe the frequency of the mutant in the 0-class, and the frequency of the wild type in the 0-class, as (31)and treat the remainder of the population deterministically (which yields an expression for the relationship between and that is identical to Equation 22).

Furthermore, since we have assumed that an additional simplification arises. In this frequency range, the frequency of both the mutant and of the wild-type 0-class exceed Thus, large fluctuations in the frequency of the mutant and of the wild type occur on timescales that are longer than generations. Because this timescale is longer than the timescale on which selection in lower classes responds (), large fluctuations in the 0-class are mirrored by the overall frequency trajectory after a time delay. In other words, on timescales longer than generations, we can expand the exponent in the integrand in Equation 22 around its peak and approximate the total allele frequency of the mutant and the wild-type alleles as and which yields a model for the total allele frequency of the mutant: (32)where is an effective noise term with mean variance and auto-correlation that vanishes on timescales longer than Thus, on timescales longer than the allele frequency trajectory is just like that of a neutral mutation in a population of smaller size On shorter timescales, the allele frequency trajectory will be more correlated in time than the frequency trajectory of a neutral population in a population of that size and will appear smoother. However, since large frequency changes of alleles at these frequencies will only occur on a timescale of order which is much longer than this description will be sufficient for describing site frequency spectra.

We emphasize that Equation 32 relies on the overall fluctuations in the fitness distribution of the population being negligible on relevant timescales, so that the average number of deleterious mutations per individual, is approximately equal to λ (and, crucially, independent of *f*). We expect that this approximation is valid when because the overall fluctuations in are small compared to λ in this limit (Neher and Shraiman 2012). However, it is less clear whether this approximation continues to be appropriate as approaches more moderate values. A more detailed exploration of these effects would require a path-integral approach similar to that of Neher and Shraiman (2012) and is beyond the scope of this work.

### The trajectories of high frequency alleles,

The neutral model from the previous section breaks down when the allele frequency of the mutant exceeds These total allele frequencies are attained when the frequency of the founding genotype exceeds the frequency When this occurs, the frequency of the wild type in the founding class will fall below and fluctuations that occur on timescales shorter than generations will once again become important. Mutant lineages that reach such high frequencies are almost certain to fix in the 0-class. Once this happens, all individuals that carry the wild-type allele at the locus will also be linked to a deleterious variant. Thus, although the mutant carries no inherent fitness benefit, it will thereafter appear fitter than the wild type because it has fixed among the most-fit individuals in the population. The mutant will therefore proceed to perform a true selective sweep and will drive the wild-type allele to extinction.

At these high allele frequencies, we can once again use the coupled branching process in Equation 16 to describe the allele frequency trajectory of the wild type, Seen from the point of view of the wild type, the fixation phase of the mutant corresponds to the extinction phase of the wild type (see right inset in Figure 3). To obtain a description of the allele frequency trajectory of the wild type at these times, we can expand the generating function in Equation 21 at long times, which yields (33)for some choice of [see *Large Lineages Arising on Unusually Fit Backgrounds* in Appendix E for details]. Note that, as before, Equation 33 is valid only as long as (*i.e.*, as long as the size of the lineage in the 1-class exceeds ). Once the frequency of the wild type in the 1-class falls below we can no longer treat this class deterministically. Once this happens, the part of the wild type that is in the 1-class will drift to extinction within about generations, whereas its bulk will continue to decay at a rate proportional to its average fitness, This will go on for as long as the frequency in the 2-class is larger than corresponding to the total frequency of the lineage being larger than Once the frequency of the wild type in the 2-class also falls below the bulk of the lineage will continue to decay even more rapidly, at rate and so on. In general, once the frequency of the lineage in class but not in class falls below which corresponds to the total frequency of the wild type being in the range the average fitness of the bulk of the wild type will be equal to (see Figure 3).

Thus, the wild type goes extinct in a staggered fashion, dying out in classes of higher fitness first and declining in relative fitness in this process. As a result, the effective negative fitness of the wild type increases as its frequency declines, leading to an increasingly rapid exponential decay of the allele frequency (see right inset in Figure 6A). By solving the implicit condition for above as we did previously (see Equations E19–E21 in Appendix E), we find that average fitness of the bulk of the wild-type distribution is to leading order equal to (34)when This means that the frequency trajectory of the wild type in this phase obeys

(35)## The Site Frequency Spectrum in the Presence of Background Selection

Having obtained a distribution of allele frequency trajectories, we are now in a position to evaluate the site frequency spectrum. Since the trajectory of any lineage depends on the fitness of the background on which it arose, we will find it convenient to divide the total site frequency spectrum, into the site frequency spectra of mutations with different ancestral background fitnesses, By definition, the total site frequency spectrum is the sum over these single-class frequency spectra: (36)We evaluate the site frequency spectrum in three overlapping regimes, and

### The site frequency spectrum of rare alleles,

The rare end of the frequency spectrum () consists of neutral alleles that (because they are rare) occurred on different genetic backgrounds. These alleles thus have independent allele frequency trajectories that can be described by the coupled branching process, Equation 16. As long as most lineage trajectories are dominated by genetic drift. Intuitively, this result is simple: provided that the lineage is rare enough, selection pressures in any fitness class (or more precisely, the bulk of the fitness classes where the vast majority of such alleles arise) can be neglected compared to drift. Thus, the resulting site frequency spectrum is (37)At these frequencies, the total site frequency spectrum is equal to (38)This agrees with our earlier intuition that, at the lowest frequencies, the entire population contributes to the site frequency spectrum and also agrees with the results of Wright–Fisher simulations (see Figure 3). Since the effects of selection are negligible, each fitness class contributes proportionally to its size, with the largest fitness classes contributing the most (see Figure 7). The deleterious mutation-free () class has a negligible effect on the site frequency spectrum, contributing only a small proportion (proportional to its total frequency, ) of all variants seen at these frequencies.

At larger frequencies, selection plays an important role in shaping allele trajectories and the site frequency spectrum. The overall contribution of mutations originating in class *k* near some frequency *f* is determined not only by the overall rate at which such mutations arise, but also by the probability that these mutations reach *f*, which declines with the initial deleterious load *k*. As a result, as *f* increases, the site frequency spectrum will become increasingly enriched for alleles arising in unusually fit backgrounds (see Figure 7).

The contributions to the site frequency spectra are straightforwardly obtained by integrating in time the distributions of allele frequency trajectories that we have described in the *Analysis*: (39)This integral is dominated by the peak phase of allele frequency trajectories, during which we have seen that the allele frequency is simply proportional to the weight in class which corresponds to the class of lowest fitness in which the dynamics are not deterministic: (40)The overall site frequency spectrum is equal to the sum of these terms (Equation 36). In *Contribution from the Peaks of Trajectories* in Appendix I, we show that this sum is well approximated by the last term, corresponding to and obtain that the site frequency spectrum in the rare end is, to leading order, (41)where the form of the frequency spectrum for is valid up to a constant factor (see *Contribution from the Peaks of Trajectories* in Appendix I for details). A comparison between these predictions and site frequency spectra obtained in Wright–Fisher simulations of the model is shown in Figure 2.

These results reproduce much of what we may have anticipated from our analysis of allele frequency trajectories. At frequencies these peaks represent the mirrored and amplified trajectories in the mutation-free () founding class. To reach these frequencies, mutations need to arise in the mutation-free class (which happens at rate ) and drift to substantial frequencies (). Since fluctuations in the founding class of lineages that exceed this frequency are slow compared to the timescale on which their deleterious descendants remain at their peak (), the entire allele frequency trajectory reproduces the fluctuations of the neutral founding class. Thus, neutral site frequency spectra proportional to emerge.

At smaller frequencies, allele frequency trajectories reflect the smoothed-out fluctuations in the high-fitness classes. At these frequencies, the site frequency spectrum comprises a rapidly increasing number of polymorphisms as the frequency decreases for three reasons (see Figure 3 and Figure 7). First, lower frequencies correspond to smaller feeding-class weights, which are more likely simply due to the effects of drift. Note that because in this frequency range the overall allele frequency is proportional to the total weight in the founding class and not the frequency, this effect leads to the site frequency spectrum falling off at a faster rate than the baseline expectation of which would occur in the absence of smoothing of fluctuations in the founding class due to the finite timescale of selection. Second, the number of individuals with *k* deleterious mutations in the locus increases with *k* (for ), causing an increase in the overall rate at which alleles peaking at lower frequencies arise. This variation in the overall number of such alleles gives rise to the steeper power law, that can be compared to the distribution of peaks of individual lineages, which decays at most as Finally, peaks that occur at frequencies have duration of order which declines with the frequency *f*, giving rise to the root-logarithm factor.

### The site frequency spectrum at intermediate and high frequencies

At frequencies much larger than but still smaller than the allele frequency trajectory is described by an effective neutral model on coarse enough timescales (). At these frequencies, the site frequency spectrum is (42)for Note that this agrees with the result of the branching process calculation, which is valid in a part of this range, at frequencies corresponding to and

This breaks down at even higher frequencies These frequencies correspond to the extinction phase of the wild type, during which the allele frequency no longer mirrors the frequency in the 0-class, but instead declines exponentially at an accelerating rate (see Equation 35). Equation 35 can be straightforwardly integrated in time (see *Contribution from the Extinction Stage of Trajectories* in Appendix I for details), which yields the form of the site frequency spectrum at these high frequencies (43)for Finally, once the wild-type frequency falls below it will be in an analogous situation as the mutant at very low frequencies: independent of how it is distributed among the fitness classes, its trajectory will be dominated by drift, since most individuals in the population have fitness that does not differ from the mean fitness by more than σ. Thus, at these frequencies, the site frequency spectrum will once again agree with the site frequency spectrum of neutral loci isolated from any selected sites in the genome: (44)By comparing our predictions to the results of Wright–Fisher simulations, we can see that this argument correctly predicts the form of the site frequency spectrum in these regimes, as well as the frequencies at which these transitions happen (see Figure 3).

### The deleterious site frequency spectrum

So far, we have focused primarily on describing the trajectories and site frequency spectra of neutral mutations. However, because the trajectory of a neutral mutation that arises in an individual with *k* deleterious mutations is equivalent to the trajectory of a deleterious mutation that arises in an individual with deleterious mutations, descriptions of trajectories of deleterious mutations follow without modification from our descriptions of trajectories of neutral mutations. The deleterious site frequency spectrum can thus be constructed from the single-class site frequency spectra of neutral mutations by a simple modification of the total rates at which new deleterious mutations arise [specifically, with the contribution to the deleterious site frequency spectrum of mutations arising in class *k* being equal to ]. By summing these contributions, we find that the deleterious site frequency spectrum is to leading order (45)where the form proportional to is once again valid up to a constant factor (for the same reason as described in *Contribution from the Peaks of Trajectories* in Appendix I).

## Distributions of Effect Sizes

The model we have thus far considered assumes that all deleterious mutations have the same effect on fitness, *s*. In reality, different deleterious mutations will have different fitness effects. In Appendix J, we show that as long as the variation in the distribution of fitness effects (DFE) is small enough that the effects of background selection are well captured by a single-*s* model. In practice, this means that when considering moderate values of fractional differences in selection coefficients up to will not substantively alter allele frequency trajectories. In this case, the combined effects of these mutations are well described by our single-*s* model.

However, when mutational effect sizes vary over multiple orders of magnitude, properties of the DFE will have an important impact on the quantitative details of the mutational trajectories that are not captured by our single-*s* model. The qualitative properties of allele frequency trajectories will remain the same (see Appendix J): alleles arising on unusually fit backgrounds will rapidly spread through the fitness distribution, peak for a finite amount of time about generations later, and then proceed to go extinct at a rate proportional to their average fitness cost. However, the quantitative aspects of these trajectories will be different. For instance, small differences in the fitness effects of mutations that do not affect the early stages of trajectories will be revealed on timescales of order and affect the size and the width of the peak of the allele frequency trajectory. We have seen that these two quantities play an important role in determining the properties of allele frequency trajectories and of the site frequency spectrum.

As a result of the fact that weaker effects and smaller differences in effect sizes play a more important role in later parts of the allele frequency trajectory, the DFE relevant during the early phases of the trajectory may be different than the DFE relevant in the later phases of the trajectory. Furthermore, since longer-lived trajectories are also those that reach higher frequencies (having originated in backgrounds of higher fitness), this can result in a different DFE that is relevant at larger frequencies compared to the DFE relevant at lower frequencies. As a result it is possible that, for certain DFEs, no single “effective” effect size can be used to describe the trajectories at all frequencies. The full analysis of a model of background selection in which mutational effect sizes have a broad distribution remains an interesting avenue for future work.

## Discussion

In this work, we have analyzed how linked purifying selection changes patterns of neutral genetic diversity in a process known as background selection. We have found that whenever background selection reduces neutral genetic diversity, it also leads to significant distortions in the neutral site frequency spectrum that cannot be explained by a simple reduction in effective population size (see Figure 1). These distortions become increasingly important in larger samples and have more limited effects in smaller samples (Figure 1, B and C). In this sense, the sample size represents a crucial parameter in populations experiencing background selection.

By introducing a forward-time analysis of the trajectories of individual alleles in a fully linked genetic locus experiencing neutral and strongly selected deleterious mutations, we derived analytical formulas for the whole-population site frequency spectrum (see Equations 1 and 2). These results can be used to calculate any diversity statistic based on the site frequency spectrum in samples of arbitrary size. Our results also offer intuitive explanations of the dynamics that underlie these distortions and give simple analytical conditions that predict when such distortions occur (Figure 3). In addition to single time-point statistics such as the site frequency spectrum, our analysis also yields time-dependent trajectories of alleles. We suggest that these may be crucial for distinguishing between evolutionary models that may remain indistinguishable based on site frequency spectra alone. We explain how this intuition about the time-dependent behavior can, in principle, be used to make simple predictions about the history and future of alleles, and we explain that it suggests new statistics of time-resolved samples that can be used to distinguish between different evolutionary models. We discuss these implications in turn below.

### The frequency of a mutation tells us about its history and future

In addition to describing the expected site frequency spectrum at a single time point, our analysis of allele frequency trajectories allows us to calculate time-dependent quantities such as the posterior distribution of the past frequency trajectory of polymorphisms seen at a particular frequency, their ages, and their future behavior. For example, since the maximal frequency a mutation can attain strongly depends on the fitness of the background in which it arose (with lower-fitness backgrounds constraining trajectories to lower frequencies), observing an allele at a given frequency places a lower bound on the fitness of the background on which it arose. This in turn is informative about its past frequency trajectory. For example, alleles observed at frequencies almost certainly arose in an individual that was among the most-fit individuals in the population and experienced a rapid initial exponential expansion at rate while alleles observed at frequencies very likely arose on backgrounds with fewer than *k* deleterious mutations compared to the most-fit individual at the time. We emphasize that these thresholds are substantially smaller than the naive thresholds obtained by assuming that a mutation arising on a background with *k* mutations can only reach the drift barrier corresponding to isolated deleterious mutations of fitness in a population of effective size

The fitness of the ancestral background on which a mutation arose is not only interesting in terms of characterizing the history of a mutation, but is also informative of its future behavior. In the strong-selection limit of background selection that we have considered here (), deleterious mutations can fix in the population only exponentially rarely (Neher and Shraiman 2012). Thus, mutations arising on backgrounds already carrying deleterious mutations must eventually go extinct. We have shown that the site frequency spectrum at frequencies is dominated by mutations arising on deleterious backgrounds. Furthermore, we have shown that most polymorphisms seen at these frequencies are at the peak of their frequency trajectory. This means that we expect the frequency of such polymorphisms to decline on average. Thus, if we were to observe the population at some later time point, we expect that the polymorphisms present at such low frequencies should on average be observed at a lower frequency. In Figure 8, we show how the average change in frequency after generations depends on the original frequency that a mutation was sampled at, *f*. Note that the expectation for a neutral population of any size is that the average allele frequency change is exactly equal to zero. In the presence of background selection, this is no longer true for neutral mutations previously observed at frequencies and (see Figure 8).

In contrast, since polymorphisms observed in the range must have originated in a mutation-free background, and since their dynamics reflect neutral evolution in this 0-class, the overall dynamics of such alleles are neutral. Therefore, although drift will lead to variation in the outcomes of individual alleles in this range, the average expected frequency change is equal to zero. This expectation is confirmed by simulations (Figure 8).

Finally, we have seen that polymorphisms seen at frequencies will typically already have replaced the wild-type allele within the 0-class. Thus, the wild-type allele must eventually go extinct (except for exponentially rare ratchet events). In other words, polymorphisms seen at these frequencies are certain to fix, replacing the ancestral allele at some later point in time (see Figure 8). Together, these results show that the site frequency spectrum can be divided into three regimes in which the dynamics of individual neutral alleles are effectively negatively selected, effectively neutral, and effectively positively selected. These effective selection pressures arise indirectly as a result of the fitnesses of the variants to which a neutral mutation at a given frequency is likely to be linked. This effect is important to bear in mind when analyzing time-resolved samples where these effective selection pressures could naively be misinterpreted as evidence of direct negative selection on low frequency-derived neutral alleles, and direct positive selection on high frequency-derived neutral alleles.

### The distinguishability of models based on site frequency spectra

As has long been appreciated, background selection can lead to signatures in the site frequency spectrum that are qualitatively similar to population expansions and selective sweeps (Charlesworth *et al.* 1993, 1995; Hudson and Kaplan 1994; Tachida 2000; Gordo *et al.* 2002; Williamson and Orive 2002; O’Fallon *et al.* 2010; Nicolaisen and Desai 2012; Walczak *et al.* 2012; Good *et al.* 2014). Here we have shown that these similarities are not only qualitative, but (up to logarithmic corrections) also quantitatively agree with the site frequency spectra produced under these very different scenarios (Yule 1924; Lea and Coulson 1949; Mandelbrot 1974). This suggests that distinguishing between these models based on site frequency spectra alone may not be possible. We emphasize that these effects of background selection that mimic population expansions are seen in *neutral* site frequency spectra in a model in which the population size is fixed, so using synonymous site frequency spectra to “correct” for the effects of demography may not always be justified.

The quantitative agreement between the effects of background selection and positive selection that we have seen in the high-frequency end of the frequency spectrum is not purely incidental. In the presence of substantial variation in fitness, alleles that fix among the most-fit genotypes in the population are, in a sense, truly positively selected because they are linked to fewer deleterious mutations than average. As a result, sweep-like behaviors can occur in the absence of positive selection, as long as there is substantial fitness variation, independent of the source of this variation (*i.e.*, whether it arose as a result of beneficial or deleterious mutations). In this case, these models may be indistinguishable even using time-resolved statistics because the allele frequency trajectories themselves have similar features.

In other cases, time-resolved statistics may be able to differentiate between models that produce similar site frequency spectra. For example, under background selection the low-frequency end of the site frequency spectrum is dominated by mutations that are linked to a larger-than-average number of deleterious variants; alleles in this regime are therefore expected to decline in frequency on sufficiently long timescales [of order ]. In contrast, in an exponentially expanding population, mutations present at these frequencies are very unlikely to change in frequency during the expansion. Thus we may be able to distinguish between these models using samples from the same population spaced far enough apart in time.

### The effect of very strong deleterious mutations []

In our analysis of background selection, we have focused on mutations with small absolute effects on fitness (). We make this choice because although deleterious mutations with very strong effect (*e.g.*, lethal mutations) do exist, they are unlikely to lead to a substantial reduction in genetic diversity unless they also occur at a very high rate (). In other words, these mutations can only have a substantial effect on reducing diversity if a large fraction of individuals in the population acquire them every generation. Thus, such mutations appear less likely to lead to strong effects on diversity in natural populations than mutations with smaller absolute effects on fitness.

### Relationship to the structured coalescent

Throughout this work, we have assumed that selection against deleterious mutations is strong (*i.e.*, ), such that they are exponentially unlikely to fix (*i.e.*, Muller’s ratchet is rare). This is the same limit in which the structured coalescent of Hudson and Kaplan (1994) is valid. Since our forward-time analysis of mutational trajectories uses similar approximations as are implicit in that method, it therefore has the same expected range of validity and accuracy. Although this limit has occasionally been referred to as “weak selection” in some prior literature, we emphasize that an assumption that is implicit in the structured coalescent is that selection against deleterious mutations is sufficiently strong that they do not routinely fix. In Supplemental Material, Figure S1, we show that our theoretical predictions indeed produce site frequency spectra that agree with the results of forward-time simulations roughly as well as numerical predictions generated using the structured coalescent. The advantage of our method is that it provides analytical predictions and scales to arbitrary sample sizes, in contrast to the structured coalescent, which is a numerical algorithm for conducting backward-time simulations.

### Relationship to results on weakly selected deleterious mutations

In the case where selection is weak enough that deleterious mutations have a substantial probability of fixation (which occurs when ), the population ratchets to lower fitness at the locus. In this limit, much like in the strong-selection case that we have studied here, the magnitude of the effects of background selection on diversity is controlled by whether or not deleterious mutations lead to substantial fitness variation at the locus. When deleterious mutation rates are weak enough that the scaled standard deviation in fitness satisfies site frequency spectra look largely neutral (see Figure S2 and Good *et al.* 2014). However, if the deleterious mutation rate is large enough that previous work has shown that substantial distortions can result (Neher and Shraiman 2011; Kosheleva and Desai 2013; Neher and Hallatschek 2013; Good *et al.* 2014). By analyzing evolutionary dynamics in this limit, Neher and Hallatschek (2013) have shown that the resulting site frequency spectrum scales as at low frequencies and as at high frequencies. These forms are similar to our limiting expressions in the high- and low-frequency ends of the spectrum, but do not contain the neutral region at intermediate frequencies. This neutral region shrinks as declines and disappears when Thus, the form of the site frequency spectrum in Equation 2 approaches the limiting forms for weak selection as (Figure S2), exactly as expected for the transition to the weakly selected regime (see also Good *et al.* 2014).

Earlier work has argued that genealogies in this weak-selection limit () approach the Bolthausen–Sznitman coalescent when fitness variance in the population is sufficiently large, (Neher and Shraiman 2011; Kosheleva and Desai 2013; Neher and Hallatschek 2013; Good *et al.* 2014). Recently, Hallatschek (2017) has studied allele frequency trajectories that arise in the forward-time dual of the Bolthausen–Sznitman coalescent. Our analysis of trajectories in the presence of strong background selection reveals many of the interesting features seen in that work. For instance, we have seen that, once an allele spreads through the fitness distribution and reaches mutation–selection balance, an effective frequency-dependent selection coefficient emerges: (46)This effective selection coefficient arises due to the deleterious mutations to which neutral mutations are linked and due to changes with the frequency *f* of the mutation as high-fitness individuals within the neutral lineage drift to extinction or fixation [see Figure 3 and *Large Lineages Arising on Unusually Fit Backgrounds* in Appendix E], and is equal to 0 in the quasi-neutral regime (Figure 3). This is analogous to the fictitious selection coefficient, that emerges in the model analyzed by Hallatschek (2017). The difference in the frequency dependence of the effective selection coefficient between our results and the Hallatschek (2017) model is large when but becomes negligible as it underlies the differences between the site frequency spectra of rapidly adapting or ratcheting populations and the strong background selection limit that we have considered here.

Note, however, that there still exists a clear discrepancy between the form of the site frequency spectrum at low frequencies that arises in the Bolthausen–Sznitman coalescent, and the functional form that is obtained by analyzing evolutionary models of weak selection (Figure S3). This suggests that the correspondence between these evolutionary models and the Bolthausen–Sznitman coalescent is only approximate, even in the limit that In particular, although the two seem to share dynamical properties which arise once the lineage spreads out through the fitness distribution (including the frequency-dependent selection coefficient), as well as similarities in some aspects of fluctuations in the numbers of high-fitness individuals as they accumulate further mutations (*i.e.*, due to genetic draft; see, *e.g.*, Kosheleva and Desai 2013), it is not immediately obvious that other aspects that we have described here, such as the smoothing of fluctuations due to drift, are identical in both models.

### Extensions and limitations of our analysis

We have studied a simple model of a perfectly linked locus at which all mutations are either neutral or deleterious with the same effect on fitness, *s*. Our primary goal has been to describe the qualitative and quantitative effects of background selection on frequency trajectories and the site frequency spectrum within this simplest possible context. However, it is important to note that the assumptions of our model are likely to be violated in natural populations. In many cases, these additional complications do not change the general conclusions of our analysis. For example, the qualitative properties of the trajectories and site frequency spectra described here apply when deleterious mutations have a broader distribution of effect sizes, and we have shown here that our results are quantitatively unchanged when the distribution of effect sizes (DFE) is sufficiently narrow On the other hand, when the DFE is very broad, additional work will be required to determine the quantitative properties of site frequency spectra. We anticipate different parts of the DFE may be important at different frequencies in sufficiently broad DFEs. If this is true, this would be an unusual feature of strong negative selection that does not arise in the case of strong positive selection, in which the effects of DFEs can usually be summarized by a single, predominant fitness effect (Good *et al.* 2012).

Finally, our assumption of perfect linkage in the genomic segment is likely to be violated in sexual populations, in which sites that are separated by shorter genomic distances are more tightly linked than distant sites. However, even in the presence of recombination, alleles will remain effectively asexual on short enough genomic distances, and are effectively freely recombining on long enough genomic distances (Franklin and Lewontin 1970; Slatkin 1972). In this case, a standard heuristic is to treat the genome as if it consists of freely recombining asexual blocks. In rapidly adapting or ratcheting populations, this heuristic has been shown to yield a rough approximation to diversity statistics when the “effective block length” is set by the condition that each block typically recombines once on the timescale of coalescence (Neher *et al.* 2013; Good *et al.* 2014; Weissman and Hallatschek 2014).

However, our analysis highlights that many of the interesting features of allele frequency trajectories in the presence of background selection occur on timescales much shorter than the timescale of coalescence. On these timescales, alleles will be fully linked on much longer genomic distances than this effective block length. This effect will be particularly important for young alleles, which are linked to long haplotypes because of the limited amount of time that recombination has had to break them up. On longer timescales, the length of the genomic segments to which these alleles are linked will become progressively shorter, but will typically not fall below the effective block length on any timescale. Given the strong dependence of allele frequency trajectories on the total mutation rate along this segment, it is less clear what effect such linkage to increasingly shorter genomic segments has on the statistics of allele frequency trajectories. A more detailed analysis of the effects of background selection in linear genomes remains an interesting direction for future work.

It is interesting to note that the effects of background selection on the site frequency spectrum in recombining genomes have been studied previously using forward-time (see, *e.g.*, McVean and Charlesworth 2000; Kaiser and Charlesworth 2008; Zeng and Charlesworth 2010) as well as using backward-time simulations based on extensions of the structured coalescent (see, *e.g.*, Hudson and Kaplan 1994; Zeng and Charlesworth 2011). However, much like in the asexual case, analytical predictions for the magnitude of the effects of background selection in recombining populations are usually limited to samples of two individuals (Hudson and Kaplan 1995; Nordborg *et al.* 1996). More recently, there has been some interest in exploring the combined effects of background selection and population subdivision (Zeng and Corcoran 2015) or partial asexuality and selfing (Agrawal and Hartfield 2016; Roze 2016). Analytical results in these cases are also often limited to very small samples and to the limit in which the effects of background selection are modest. We hope that our forward-time approach can be extended in future work to explore the effect of background selection in the presence of such factors more fully.

## Acknowledgments

We thank Oskar Hallatschek, Joachim Hermisson, Katherine Lawrence, Matthew Melissa, Richard Neher, Daniel Rice, Boris Shraiman, Shamil Sunyaev, John Wakeley, and the members of the Desai laboratory for useful discussions and helpful comments on the manuscript. Simulations in this article were run on the Odyssey cluster supported by the FAS Division of Science Research Computing Group at Harvard University. This work was supported in part by the Simons Foundation (grant 376196), grant DEB-1655960 from the National Science Foundation, and grant GM-104239 from the National Institutes of Health. The authors also acknowledge the Kavli Institute for Theoretical Physics at University of California, Santa Barbara, supported in part by the National Science Foundation grant PHY-1125915, National Institutes of Health grant R25 GM-067110, and the Gordon and Betty Moore Foundation grant 2919.01.

## Appendix A: The Propagation of Fluctuations in the Size of the Founding Class

In this appendix, we consider in more detail how fluctuations in the size of the lineage in the founding class propagate to affect the total allele frequency. For this purpose, it will be convenient to consider a neutral mutation that arose in the class sufficiently long ago that it is in mutation–selection balance. Let the total frequency of the lineage be *f*. As in the main text, we denote the frequency of the part of the lineage that is in class *i* by In mutation–selection balance, the will satisfy

Consider what happens if the frequency of the founding genotype changes suddenly to some value Based on the deterministic solution, after a time *t*, this will lead to a change in the frequency of the part of the lineage in class *i*, of (A2)In other words, the relative change in the frequency of the lineage in the *i*-class is (A3)This approaches at long times as the allele reestablishes mutation–selection balance. However, we can see from Equation A3 that this change is not felt at the same time in all classes. In the 1-class, the frequency changes gradually, at rate *s* (Equation A3), and results in a proportional change roughly generations later. In general, in the *i*-class, this change is felt after a total delay of roughly generations. Thus, the change propagates from class *i* to class over the course of (A4)generations.

Ultimately, generations later, this change will have been felt in a substantial fraction of the fitness distribution. Fitness classes near the mean of the distribution (which is λ classes below the 0-class) are those that exhibit the largest absolute change in frequency, since they contain the largest number of individuals when the lineage is in mutation–selection balance. Thus, changes in these classes account for a large proportion of the change in the total allele frequency, which explains the origin of the delay timescale, that we have introduced in the main text.

## Appendix B: The Large Deviations from Average Behavior Caused by Genetic Drift

In this appendix, we consider the importance of the effects of drift in each individual fitness class on the overall allele frequency. In the first subsection, we revisit a standard argument to explain why fluctuations due to genetic drift in the frequency of the founding genotype can never be neglected, framing it in terms that will be useful when considering the importance of drift in classes below the founding class. In the next subsection, we build on this argument to explain why the effects of drift become negligible in all classes *i* in which the frequency, of the component of the lineage in that class satisfies but cannot be neglected in all classes in which the frequency does not exceed that threshold.

### The Importance of Genetic Drift in the Founding Class

The essential reason why drift can never be neglected in the early phase of a trajectory is that deviations from the low frequency average behavior caused by drift are not small perturbations, but are extremely broadly distributed. Consider for instance a mutation that arises in class *k*. As we explain in the main text, the founding genotype feels an effective selection coefficient equal to The “deterministic trajectory” of the founding genotype is therefore (B1)In other words, the deterministic trajectory of a neutral founding genotype () is a flat line, whereas the deterministic trajectory of a deleterious founding genotype () decays exponentially at rate

However, we know that drift leads to large deviations from the deterministic behavior in Equation B1. In fact, we have mentioned that when drift can lead to an *x*-fold increase above this expectation with probability (Fisher 2007). Thus, the deviations from the deterministic expectation due to drift are distributed according to an extremely broad power law. As a result, large deviations from Equation B1 are very likely. For lineages arising in the 0-class, these deviations can take the frequency of a lineage all the way to fixation. However, deleterious founding genotypes with are exponentially unlikely to exceed the drift barrier at Thus, the distribution of deviations from the mean, deterministic behavior of these founding genotypes also follows the same power law at low frequencies (), but is capped by selection at frequencies exceeding As a result, the effects of drift on trajectories of deleterious mutations become perturbative at sufficiently large frequencies and can therefore be neglected when

Because fluctuations in always propagate to classes of lower fitness, drift in the founding class has an important impact on the overall allele frequency whenever it has an important impact on This means that the overall frequency trajectory of alleles founded in the 0-class will always be affected by drift in which will cause large, power law-distributed deviations from the deterministic expectation of the total allele frequency trajectory. Similarly, the overall frequency trajectory of alleles founded in a class with deleterious mutations will be affected by drift in the founding class when the overall allele frequency satisfies (which correspond to founding class frequencies ), but prevented by selection from exceeding frequencies larger than (see also Appendix E).

### The Importance of Genetic Drift in Classes Below the Founding Class

Given these arguments, one may wonder whether the effects of drift are also important in classes below the founding class, in which individuals carry deleterious mutations. Deviations from deterministic behavior in these classes (*i.e.*, in ) are also propagated to classes of lower fitness. Such deviations in if large, will also have a large impact on the overall frequency trajectory of the allele, However, since classes below the founding class receive substantial mutational input from higher classes, it is not immediately clear whether the effects of drift on will “average out” as a result of these mutations, or whether drift can still lead to large deviations from the deterministic expectation for In Appendix E, we show by formally analyzing the distribution of allele frequency trajectories that drift in class *i* is negligible when and in this appendix we give a heuristic argument explaining why this threshold arises. This heuristic argument does not reproduce factors that are obtained using formal methods (*i.e.*, the factor of in ), but it offers additional intuition on the existence of this threshold and its dependence on the parameters *N*, *s*, and *i*.

The threshold is reminiscent of the drift barrier relevant for single deleterious loci of fitness *is*. However its relevance in classes below the founding class is not immediately obvious. Although the individuals in class *i* also feel an effective selection pressure equal to new mutational events from class counter these effects of selection. Thus, it is not obvious that the combination of the opposing effects of mutation into the class and selection within the class will be stronger than the effects of drift whenever (as opposed to some other threshold that also depends on ).

To gain insight into this, we consider in more detail the effects of individual mutational events into class *i*. Each of these mutational events can be thought of as founding a new sublineage in class *i*. The frequency trajectory of each sublineage is the same as that of a single locus with fitness and the overall trajectory is equal to the sum of the trajectories of these sublineages. When a sublineage is small, drift will lead to large deviations from its average (deterministic) frequency trajectory, which is also given by Equation B1. However, as in the founding class, at frequencies larger than these deviations are capped by the effects of selection. Thus, the drift barrier represents the frequency above which fluctuations cannot lead to large deviations of individual sublineages from the average behavior.

To understand when drift has an important impact on the overall *i*-class trajectory we can consider how these deviations in the trajectories of the sublineages add. At sufficiently small frequencies, the overall trajectory will be equal to the sum of random trajectories that have an extremely broad distribution. In this case, the sum will be dominated by the trajectory of the largest sublineage, which will be very different than the average trajectory. Thus, even when the total number of mutational events into class *i* is large, the effects of genetic drift in class *i* may not be negligible if each of these mutational events results in a relatively small trajectory. In other words, fluctuations due to drift in the frequency trajectories do not average out, but are rather dominated by the largest deviation from the mean. Conversely, when the total number of sublineages is large enough that many of them reach the frequency (which is guaranteed to happen if the total number of mutational events into the *i*-class is much larger than ), the overall frequency of the lineage will be much larger than In this case, the largest event is no longer very different than the average event; the effects of genetic drift are therefore negligible compared to the effects of selection. The transition between these two behaviors happens when which roughly corresponds to exactly one sublineage exceeding We discuss these effects using a more formal approach in Appendix G. Note that, by extending this argument to classes and lower, we can verify that once the frequency trajectory in class *i* exceeds and becomes predominantly shaped by mutation and selection, the frequency of the allele in all lower-fitness classes is also guaranteed to exceed the corresponding frequency thresholds. This is why we can also neglect the effects of drift in all classes below a class in which the frequency exceeds

## Appendix C: The Generating Function for the Total Size of the Labeled Lineage

In this appendix, we consider the generating function for the total frequency of the lineage,

and derive a partial differential equation describing how it changes in time. As described in the main text, when the size of the lineage is small [], its dynamics are described by the coupled system of Langevin equations for the components of the total frequency *f* that denote the frequency of the part of the lineage that carry *i* deleterious mutations, (C2)In Equation C2, the are independent, uncorrelated Gaussian noise terms. The total allele frequency is equal to the sum of these components,

Note that the total allele frequency is not a Markov random variable since its evolution depends on the details of the distribution of the individuals within the lineage among the fitness classes. However, the frequencies of the components are jointly Markov, with their joint distribution described by the joint generating function (C3)The generating function for can be obtained from the joint generating function by setting for all *i*. We can obtain a PDE for the joint generating function by Taylor expanding and substituting in the differentials from Equation C2, which yields (C4)We can solve this PDE for the joint generating function by using the method of characteristics. The characteristic curves are defined by (C5)and satisfy the boundary condition The linear terms in the characteristic equation arise from selection and mutation out of the *i*-class, and the nonlinear term arises from drift. Along these curves, the generating function is constant and so where the initial condition corresponds to a single individual present in class *k* at Thus, to obtain a solution for the joint generating function, we need to integrate along the characteristics in Equation C5 backwards in time from to In the next few appendices, we obtain these solutions in the limits of weak () and strong mutation ().

## Appendix D: Trajectories in the Presence of Weak Mutation ()

When deleterious mutations arise more slowly than selection removes them (), deleterious descendants of a lineage are much less numerous than the founding genotype. To see this, we can expand the characteristics in powers of the small parameter λ. At leading order, the characteristics are uncoupled and can be straightforwardly integrated to obtain

By substituting this zeroth-order solution into Equation C5, we find that corrections due to deleterious descendants are and are therefore small uniformly in *z*. Thus, the generating function for the total *f* of the labeled lineage *t* generations after arising in class *k* is (D2)which agrees with classic results by Kendall (1948) for the generating function of independently segregating loci of fitness

Equation D2 can be inverted to obtain the probability distribution, by an inverse Laplace transform: (D3)This distribution is well known, and can be obtained by standard methods. Noting that has a single essential singularity at we can perform the integral above either exactly by contour integration (by closing the contour using a large semicircle in the left half-plane and a straightforward application of the residue theorem, which gives a solution in terms of Bessel functions) or approximately by the method of steepest descents (taking care to deform the contour to pass through the saddle point on the right of the essential singularity). By carrying out this inverse Laplace transform, we obtain that the extinction probability by time *t* is (D4)which becomes of order one when in agreement with our intuition that a lineage of fitness can only survive for order generations. For nonextinct lineages, the probability distribution of the frequency is (D5)The site frequency spectrum can be obtained from this distribution of frequencies by integrating Equation D5 in time, or by an alternative method that we present in Appendix F.

## Appendix E: Trajectories in the Presence of Strong Mutation ()

When deleterious mutations arise faster than selection can remove them, mutation will play an important role in shaping the trajectory. The relative strength of mutation and selection compared to drift will depend on the frequency of the lineage. Drift will remain the dominant force at frequencies However, at larger frequencies, the mutation and selection terms will become important and we will see that the effects of drift in classes of low enough fitness become negligible.

### Small Lineages ()

The dominant term in the characteristic equation in this regime (which corresponds to in the generating function) is the drift term (E1)which has the solution (E2)We can verify that mutation and selection are negligible compared to drift on timescales of order as long as Note that this condition () is satisfied for essentially all of the individuals in the population since By summing the terms, we find that on these timescales the generating function for the frequency of the mutation is (E3)which is just the generating function for the frequency of a neutral lineage (*cf*. Equation D2). On longer timescales (), this approximation breaks down and mutation and selection cannot be neglected for lineages arising in fitness classes far above the mean of the fitness distribution (with ). This is because the probability that a portion of the lineage in a class with fewer than mutations has drifted to a high enough frequency to feel the effects of mutation and selection becomes substantial on longer timescales, which can also be seen from the probability distribution of nonextinct lineages (Equation D5). We consider the generating function of these unusually fit mutations at these higher frequencies in the next subsection. In contrast, mutations that arise on more typical backgrounds with mutations can drift to higher frequencies, of order before feeling the effects of selection, but cannot substantially exceed a total frequency We analyze their trajectories in the following subsection.

### Large Lineages () Arising on Unusually Fit Backgrounds ()

In lineages that reach higher frequencies, a large number of deleterious descendants arise every generation. This leads to strong couplings between the sizes of the components of the lineage in different fitness classes, and diminishes the importance of genetic drift in classes of lower fitness, which receive large numbers of deleterious descendants from classes of higher fitness. We find that, in classes of low enough fitness, the effects of genetic drift are negligible and the dominant balance is between the linear mutation and selection terms.

The solution to the linear (deterministic) problem has been obtained by Etheridge *et al.* (2009), but we reproduce the derivation briefly for completeness. In the absence of drift, the characteristics evolve according to (E4)which defines the linear operator ℒ has right eigenvectors with eigenvalues given by (E5)and corresponding left eigenvectors (E6)We can verify that the left and right eigenvectors are orthonormal []. By eigenvalue decomposing and integrating backward in time from to we obtain where the amplitudes are set by the boundary condition at Finally, a summation yields (E7)Setting the boundary condition at to and evaluating we reproduce the result by Etheridge *et al.* (2009): in the absence of genetic drift, the descendants of the labeled lineage follow a Poisson distribution that starts in class *k* and has mean and amplitude

To evaluate the effect of genetic drift on the total size of the lineage at some later time point we set A sufficient (but not necessary) condition for genetic drift in class *i* being negligible in determining the total size of the lineage at some later time point *t* is that the nonlinear term uniformly in In the vicinity of some frequency corresponding to we find that the nonlinear term is negligible uniformly in as long as (E8)Note that the condition in Equation E8 is obtained by plugging in the relationship between and (from Equation E7) into the condition that Since the left-hand side in Equation E8 is bounded by the inequality is guaranteed to be satisfied uniformly in *t* as long as (E9)Defining to be the smallest integer for which we can verify that genetic drift is negligible in all classes with but not in class

Note that self-consistency of the deterministic solution for implies that when the frequency of the part of the allele in class *i* satisfies for all but not for Also note that this inequality can only be satisfied for some if the founding class is sufficiently far above the fitness distribution ( where ). We return to lineages founded in classes with mutations in the next subsection.

Thus, since genetic drift has a negligible effect in classes containing more than deleterious mutations, the characteristics are given by the deterministic solution above, which we have already integrated. The frequency of the part of the lineage in classes with is therefore a deterministic function of the frequency trajectory in class We can solve for this deterministic function straightforwardly by explicitly including as a variable mutational source term for classes of lower fitness. This yields an expression for the generating function of the entire lineage (E10)when (E11)where we have used the notation from the main text:

#### The relationship between the feeding class trajectory and the allele frequency trajectory f(t)

Equivalently, this result can be rewritten in terms of the relationship between the allele frequency trajectory and the trajectory of the portions of the alleles in classes with (E12)which is valid as long as Because the expression on the right-hand side of Equation E12 is dominated by the last term, the full allele frequency trajectory reduces to a single stochastic term Therefore, we can calculate the distribution of near any given frequency *f* by: (1) determining the feeding class which corresponds to the class of lowest fitness in which genetic drift is not negligible; and (2) calculating the distribution of this time integral of the trajectory in that class, subject to the boundary condition that

In principle, this is still challenging if because the trajectory in class still depends on the trajectories in higher-fitness classes, all of which are stochastic. In addition, calculating the distribution of the convolution of and is still difficult, even when Fortunately, a simplification arises from the highly peaked nature of Because the exponent in is peaked in time, the integral in Equation E12 is, up to exponentially small terms, dominated by the region in which is largest. Since the variation in the magnitude of is much larger than the variation in the magnitude of the integral will be dominated by the window during which is at its peak, as long as in that window. In that case, we can make a Laplace-like approximation in Equation E12, in which we expand around its peak, and neglect contributions that are far away from this peak, since these are exponentially small. Near (E13)which yields (E14)As a result of this simplification, the allele frequency does not depend on the full frequency trajectory in the feeding class but only on its time integral (weight) in a window of width around which we denote by Note that Equation E14 implies a simple condition in terms of the allele frequency trajectory in this feeding class that specifies when drift is negligible in downstream classes. We have shown above that as long as the total allele frequency drift is negligible in classes with more than deleterious mutations per individual. From Equation E14, we can see that this condition can be restated in terms of the weight in the feeding class as (E15)Thus, can also be thought of as corresponding to the class of highest fitness in which the weight exceeds

The approximation we have used in Equation E14 breaks down at very early times [] and very late times, during which in the relevant window. These correspond to the spreading and extinction phases of the trajectory. We show in Appendix I that the former has a negligible impact on the site frequency spectrum. The latter phase however has an important effect at very high frequencies of the mutant, *i.e.*, when the wild type is rare and in its own extinction phase. During this extinction phase, (E16)uniformly in *t* and the frequency trajectory is well approximated as(E17)Applying the Laplace approximation once again, we conclude that the integral in Equation E17 is dominated by the window of width prior to extinction in the *k _{c}*-class and therefore only weakly depends on time. Thus, during this extinction phase, the allele frequency decays exponentially at rate and can be written as (E18)for some choice of where reflects the maximal frequency the trajectory reached before the onset of the extinction phase.

Thus, we can see that in the extinction phase of the trajectory, the effective fitness of the lineage changes with the frequency according to (E19)To obtain an explicit expression for how the feeding class and therefore depend on the frequency *f*, we can solve the condition that for by setting for some that satisfies We find that, to leading order, (E20)By plugging this back into the expression for we find that, in the extinction phase of the trajectory, the effective selection coefficient changes with the frequency of the lineage according to (E21)In summary, we have shown in this appendix that the allele frequency trajectory in the peak phase of the allele only depends on the time integral of the frequency in class over a window of specified width and that, outside this peak phase, the trajectory has an even simpler time-dependent form that we described above.

As we will see in Appendix F, the generating function for this relevant weight in class is straightforward to calculate when is the founding class (*i.e.*, for ). This case is relevant for trajectories that arise in class *k* and exceed frequencies which means that the feeding-class weight will exceed for a certain period of time.

However, not all trajectories that arise in class *k* will reach such large frequencies. We have seen in an earlier section that trajectories that do not ever exceed frequencies much larger than will have a trajectory that is dominated by drift throughout its lifetime. However, even those that do exceed and therefore leave behind a large number of deleterious descendants will often not reach the much larger frequency In this case, we will have to treat multiple fitness classes stochastically and the weight relevant for the peak of the trajectory will be that in class For a further simplification results from the fact that the width of the window is longer than the lifetime of the mutation in class (see Appendix F and Appendix G for details). We use this simplification to calculate the resulting weight distribution in Appendix G. Finally, in Appendix I we use these results to obtain expressions for the average site frequency spectrum both in the case of strong and weak mutation.

### Lineages Arising on Typical Backgrounds ()

Lineages founded in classes with mutations will not enter the semideterministic regime described above. This is because selection in each individual class *i* in which they can be present prevents from exceeding where the latter () is the necessary threshold for a large enough number of deleterious descendants to be generated that their dynamics become dominated by selection in some class below the *i*-class. This threshold equal to emerges from our analysis of the coupled branching process in the previous subsection and is further clarified and discussed in Appendix G.

In contrast to lineages arising far above the mean of the fitness distribution, the frequency trajectories of lineages that arise near the mean of the fitness distribution are dominated by drift and are eventually capped by negative selection at large enough frequencies. Selection becomes an important force about generations after the lineage was founded. At this time, the accumulated deleterious load since arising becomes large enough to affect the trajectory of the mutation. This deleterious load will affect the trajectory substantially when the frequency of the lineage becomes comparable to the drift barrier set by its current relative fitness The expected fitness of a lineage founded near the mean of the distribution [with ] is Provided that the lineage has not drifted to extinction by *t*, its expected frequency at *t* is Thus, when the effect of selection will dominate over drift. This occurs when Thus, lineages that arise near the mean of the fitness distribution have a trajectory that has neutral statistics for the majority of its lifetime, but does not exceed Finally, lineages arising in classes far below the mean of the fitness distribution (), will also be dominated by drift, but limited to even lower frequencies. However, these lineages are also comparatively rare and only have a small relative impact on the lowest-frequency part of the site frequency spectrum ().

## Appendix F: The Distribution of Allele Frequencies and of the Weight in the Founding Class

In this appendix, we calculate the distribution of frequencies and weights for the stochastic process defined by

with and for This process describes the trajectory of the component of the lineage that remains in the founding class (the founding genotype). To calculate these distributions, we begin by defining the joint generating function for the frequency and the total time-integrated weight up to time *t*: (F2)The joint generating function for these two quantities is defined as (F3)and satisfies the PDE (F4)Once again, we solve this PDE using the method of characteristics. The characteristics are defined by (F5)and are subject to the boundary condition The generating function is constant along the characteristics (), and therefore satisfies (F6)After integrating the ordinary differential equations in Equation F5, we find that the characteristics follow (F7)with

We can verify that the correct marginal generating function for the frequency of the lineage emerges from this result by setting and imposing the boundary condition which corresponds to the initial frequency at being

To obtain the marginal generating function for the weight in the window between and we set and choose a boundary condition that reflects the distribution of frequencies generations after the lineage was founded (see Equation D2): (F8)where (F9)The generating function in Equation F8 captures the full time-dependent behavior of the weight in the founding class in a window of width and can be inverted by standard methods. However, it is in practice unnecessary to invert Equation F8 to calculate the site frequency spectrum. For our purposes here, we will be mostly concerned with two special cases: the total weight in the founding class from founding to extinction, and the time integral of the distributions of frequencies and weights in a window of specified width The former case has been calculated previously by Weissman *et al.* (2009). We quote and discuss this result for completeness in the section below. We then analyze the latter case in the following section.

### The Distribution of the Total Lifetime Weight in the Founding Class,

The first special case that will be relevant to our analysis of trajectories and allele frequency spectra is the total integrated weight in the founding class from founding () to extinction. By setting in Equations F8 and F9, we find that the generating function for the total weight from founding to some later time *t* is (F10)Note that Equation F10 becomes independent of time when (uniformly in ζ), which agrees with our heuristic intuition that the lifetime of a mutation in class *k* is not longer than generations. Since we have shown in *Large Lineages Arising on Unusually Fit Backgrounds* in Appendix E that the allele frequency trajectory depends on the weight in a window of width (where the ≥ sign follows because ) that is longer than for (with being the marginal case), the distribution of will be either equal to the total lifetime weight of the allele [for ] or negligible for

By taking the limit in Equation F10, we obtain that the generating function for the distribution of the lifetime weight in the founding class is (F11)The inverse Laplace transform of Equation F11 can be evaluated by standard methods, which yields the distribution of the lifetime weight in the founding class:

(F12)### The Time Integrals of and

To calculate the average site frequency spectrum, we need to calculate the time integral of the distributions of frequencies and weights over time. In principle, this can be done by inverting Equation F8 and then integrating the distribution of over time. However, since this is a somewhat laborious calculation, we will use a convenient mathematical shortcut in which we first solve for the distribution of weights in a different stochastic process and then relate this back to the original process in Equation F1.

Specifically, we consider the stationary limit of the stochastic process defined by the Langevin equation: (F13)This describes the time evolution of the frequency of a lineage with fitness in which individuals are continuously generated by mutation at some rate (and have frequency at the time when they are generated). This process is relevant because the distribution of frequencies and weights in the stationary process are related to the time integrals of the distributions of and More precisely, in the limit that (keeping *N* constant), the distributions of *f* (and its time integrals) in the stationary process are the same as the time-integrated distributions of the nonstationary process, provided that we also divide by the total rate at which new individuals are generated, to ensure proper normalization. That is, (F14)We denote the joint generating function for the frequency, *f*, and weight in this process, by (F15) satisfies the PDE (F16)Note that the generating functions for the two processes are related and that, by setting in Equation F16, we obtain the generating function for the nonstationary process (see Equation F4). In particular, the characteristics for Equation F16 are the same as the characteristics for Equation F4 and they follow the form we calculated previously and quoted in Equation F7. Along these characteristics, the generating function satisfies (F17)or equivalently, after integrating, (F18)However, the boundary conditions for the two processes are different. The nonstationary process is subject to the boundary condition that there is a single individual present in the lineage at whereas the stationary process is subject to the boundary condition that the process is stationary at the initial time point, The stationary property of the frequency distribution is guaranteed by the boundary condition This can be obtained either by inspection or by substituting an arbitrary boundary condition and finding the limiting form for the generating function for the frequency as and noting that becomes independent of *z* as so the initial condition has no impact on the frequency distribution.

Plugging in the expression for from Equation F7 into Equation F18 and performing the integral over we arrive at the solution to the joint generating function for and (F19)To obtain the marginal generating function for we set giving and (F20)Conversely, to get the generating function for we set which after some rearranging yields (F21)and (F22)We invert Equations F20 and F22 below.

#### Inversion of the generating functions in Equations F20 and F22

Since only the nonextinct portion of the process contributes to the site frequency spectrum, when inverting the generating functions for the weight and frequency, we will use the following relationship between the probability distribution and the moment-generating function of a random variable *g*: (F23)From the definition of the moment-generating function and the sine limit definition of the Dirac δ function [], it follows that the boundary terms amount to the probability mass at and that the distribution of the nonzero portion of the process is (F24)After plugging this expression and the generating function for the frequency Equation F20 into Equation F14 and taking the limit, we find that the time-integrated distribution of frequencies in the founding class is (F25)The time-integrated distribution of weights in the feeding class can be obtained in an entirely analogous fashion. In this case, it will be convenient to treat the cases and separately. When a lengthy but straightforward substitution of Equation F22 into Equation F24 gives (F26)The simplest way to carry out this integral is by contour integration. To do this, we close the contour using a large semicircle in the left half-plane. The contribution from this circle vanishes as the radius of the semicircle approaches infinity, and so the integral considered above is equal to the sum of the residues within the left half-plane. The integrand has simple poles at for with residues which yields (F27)where is the elliptic theta function. Asymptotic expansions for small and large arguments give (F28)The case is slightly more straightforward to evaluate since the length of the intervals we are interested in is longer than the typical timescale of selection, As a result, the arguments in the hyperbolic functions in Equation F22 satisfy (for with being the marginal case), which yields a simple form for the distribution of nonzero weights: (F29)Note that the expression in Equation F29 reduces to a standard Gaussian integral. By carrying out this integral, we obtain for the time integral of the distribution of weights in the founding class:

## Appendix G: The Distribution of Weights in Classes Below the Founding Class

We have seen in Appendix E that, when the allele frequency trajectory in the founding class is small enough, the effects of genetic drift cannot be ignored in multiple fitness classes. In this section, we consider how the trajectories (and their weights) in these stochastic classes are coupled and derive the distribution of lifetime weights in class in which individuals carry Δ more mutations compared to individuals in the founding class.

### The Relationship Between the Trajectory in the Founding Class *k*, and the Weight in Class

We begin by considering the total lifetime weight in the class right below the founding class (), which we will denote clearly depends on the weight in the founding class, since the total number of mutational events from the *k*-class into the ()-class is equal to As we describe in *The Importance of Genetic Drift in Classes Below the Founding Class* in Appendix B, each one of these mutational events founds a sublineage, and the stochastic trajectory of each sublineage is described by Equation F1. The total weight of the lineage in class is simply the sum of the weights of each of these sublineages. The generating function of the lifetime weight in the class, is related to the lifetime weight in the founding class according to (G1)where denotes the weight of the sublineage founded by the mutational event. Since the are independent and identically distributed, the generating function of their sum is equal to the product of their generating functions and (G2)where the final average is taken over the distribution of the weight, in the *k*-class. The generating functions of and are both given by Equation F11.

Using the same methods that we used to invert Equation F11, we obtain that the distribution of the total weights in class conditioned on the weight in class *k* being equal to is (G3)We can see from this equation that the neutral decay of the distribution of weights in class which results from drift and is proportional to is exponentially cut off for and for . The latter, high-weight cutoff is familiar from before and results from selection within the class. The low-weight cutoff results from the pressure of incoming mutational events.

A simple heuristic can explain the dependence of the low-weight cutoff on the weight in the founding class, The weight is at least as large as the weight of the largest sublineage. Because each of the mutational events generates a sublineage that survives for *T* generations with probability and leaves a weight of order at least one of these sublineages will survive for *T* generations with probability equal to This probability is of order 1 for which means that with probability order 1 at least one of the sublineages will have weight Note that this also means that when (consistent with the lineage exceeding frequency in the founding class), the weight in the next class is guaranteed to be larger than the weight in the founding class. This means that lineages that exceed the frequency in the founding class are almost guaranteed to generate an even larger number of individuals in the next class, which generates an even larger number of individuals in the following class, and so on.

We have implicitly assumed that the trajectory of each of the sublineages is dominated by drift. This will be true as long as [*i.e.*, as long as ]. In contrast, when a large number the lineages will exceed the frequency in the next class, and the trajectory in that class will become dominated by selection. We have shown in Appendix E that once this happens, drift in class and all classes below it will become negligible. Note that this heuristic argument also explains the self-consistency condition that emerged in Appendix E (see Equation E15) and explains why genetic drift becomes negligible in the ()-class whenever the weight in the -class is larger than .

In the section below, we will use the insights above to evaluate the weight distribution in class conditioned on the lineage arising in class *k* and selection being negligible in all classes beneath it, for Because the lifetime of the longest-lived sublineage in each of these classes is at most in this limit and because the sublineages are seeded into the *i*-class over a time that is, by assumption, shorter than the total lifetime of the lineage in all of these classes is strictly shorter than which is why we do not need to be concerned with the full, time-dependent properties of the distribution of weights in this class. Instead, the calculation of the distribution of lifetime weights will suffice for calculating the site frequency spectrum.

### The Distribution of the Weight in Class

Having obtained the distribution of the weight in class conditioned on the weight in class *k* being equal to (see Equation G3), we can calculate the marginal distribution of weights by averaging over In the limit that and that we are interested in here, this distribution is (G4)Note that the distribution in the ()-class decays less rapidly than in the *k*-class. In particular, the probability that the weight in the ()-class exceeds (and leads to the deterministic propagation of individuals in classes with or more deleterious mutations) is (G5)which is larger than the probability that the weight in the *k*-class exceeds the corresponding value by a large factor consistent with our intuition that the weight in the class below the founding class is guaranteed to exceed the weight in the founding class if

In general, we can calculate the distribution of the weight in class by iterating this procedure. Specifically, the distribution of the weight in class conditioned on the weight in class being equal to also follows Equation G3 (but with *k* changed to ). By repeating the above procedure Δ times, we find that the distribution of lifetime weights in class is

## Appendix H: The Site Frequency Spectrum in the Presence of Weak Mutation ()

In the following two appendices, we use the results obtained in previous sections to calculate the site frequency spectrum of the labeled lineage in the limits that and by evaluating and inverting the generating function for the total frequency of the labeled lineage.

We have seen in Appendix D that trajectories of mutations in the presence of weak background selection () are, to leading order in the small parameter λ, the same as those of isolated loci with fitness In Appendix F, we have shown that the time-integrated distribution of allele frequencies of a single, isolated locus of fitness is

which agrees with classical results by Ewens (1963) and Sawyer and Hartl (1992). Thus, the contribution to the site frequency spectrum of neutral mutations arising in class *k* is (H2)Summing the contributions of all the classes, we find that the full neutral site frequency spectrum is (H3)The site frequency spectrum of deleterious mutations follows from the same argument, since the trajectory of a deleterious mutation arising on the background of an individual with *k* deleterious mutations is the same as the frequency trajectory of a neutral mutation arising in an individual with deleterious mutations. Thus, the site frequency spectrum of deleterious mutations is (H4)which once again agrees to leading order with the site frequency spectrum that we would have obtained assuming that all selected sites at the locus were isolated.

## Appendix I: The Site Frequency Spectrum in the Presence of Strong Mutation ()

In this appendix, we calculate the site frequency spectrum of the labeled lineage in the limits that and that by evaluating and inverting the generating function for the total frequency of the labeled lineage.

In the presence of strong mutation, we have seen that trajectories of mutations are dominated by drift at the lowest frequencies, where the generating function reduces to the generating function of a neutral mutation and is simply equal to the limit of the single locus-generating function in Equation D2. We have already calculated the site frequency spectrum that results from these trajectories in the previous section. Plugging in these results, we find that (I1)The site frequency spectrum at these frequencies is dominated by the contributions of lineages arising in average backgrounds, with By the same argument, the frequency spectrum of deleterious mutations at the same frequencies is also (I2)At larger frequencies, the site frequency spectrum becomes dominated by lineages arising in unusually fit backgrounds, with Their trajectories are instead described by Equation E10. We have seen that the integral in the exponent of Equation E10 has a different dependence on *t* for and which we have labeled the “spreading,” “peak,” and “extinction” phases of the trajectory. In evaluating the site frequency spectrum it will be convenient to calculate the contributions from each of these phases separately. We denote these contributions as and and the full site frequency spectrum is obtained by summing: (I3)We evaluate and in the next two subsections of this appendix. We then show in the last subsection of this appendix that the contribution from is subdominant to that of

### Contribution from the Peaks of Trajectories

In *Large Lineages Arising on Unusually Fit Backgrounds* in Appendix E, we have shown that, in the peak phase of the trajectory, the total allele frequency is (I4)where is the class with the smallest number of mutations for which or equivalently, the class with the smallest number of mutations in which the weight exceeds

We have seen above in Appendix G that, to achieve such a large weight in class a mutation could have arisen in class and traced an unusually large trajectory, or arisen in class and traced a smaller trajectory in that class, which led to the creation of a large number of deleterious descendants in class at least one of which had weight exceeding Alternatively, it could have also arisen in class and traced an even smaller trajectory in that class that led to a larger weight in class and a sufficiently large weight in class for genetic drift to be negligible in classes In other words, in the range of frequencies (I5)we see the peaks of trajectories originating in classes as long their weight in class is large enough that genetic drift in classes of lower fitness can be ignored. All of these peaks contribute to the site frequency spectrum and, by integrating Equation I4 in time, we find that (I6)where the last term represents the time-integrated distribution of weights in a window of width in class of a lineage that arose in class *k*. This distribution is given by Equation F28 for Otherwise, when the time-integrated distribution in Equation I6 is equal to the product of the window width, and the distribution of lifetime weights in the founding class, given in Equation G6.

Since we have previously calculated all of these quantities, we can now turn to evaluating the sum in Equation I6. When then and the sum in Equation I6 has only one term (). By substituting in the expression for the time-integrated distribution of weights in Equation F28, we find that (I7)At lower frequencies lineages originating in multiple different fitness classes will be able to contribute to the site frequency spectrum. At these frequencies, (I8)Plugging in the expression for from Equation G6, we find (I9)Because this sum is dominated by the term, since decays much more rapidly with decreasing *k* than any of the other terms increase. To evaluate the *f*-dependence of this term for and we repeat the same procedure as in *Large Lineages Arising on Unusually Fit Backgrounds* in Appendix E to obtain an explicit form for Briefly, to solve the self-consistency condition for we set for some that satisfies and find that to leading order (I10)Plugging in, we obtain that the leading-order term in the distribution of peak sizes is (I11)The term depends on *f* weaker than logarithmically and on frequency scales on which changes substantially it will be approximately constant,

Because the crossover between the scaling of which occurs at high frequencies [where ], and the behavior, which is valid at substantially lower frequencies [where ], is in principle broad, this constant factor *C* is difficult to determine: asymptotic matching does not typically work well in the presence of such broad transitions and crude “patching” methods do not, in general, offer satisfactory results (Hinch 1991). Thus, Equation I11 is undetermined up to the constant factor which is between 1 and For our purposes here, this level of precision is sufficient— precision in the form of the spectrum was, after all, expected in the Laplace-like approximation that we used in *Large Lineages Arising on Unusually Fit Backgrounds* in Appendix E to calculate the stochastic integral over the trajectory of the feeding class. Thus, by absorbing the term into this constant factor and relabeling as *C*, we find that the peak contribution to the site frequency spectrum is (I12)with *C* in the range

### Contribution from the Extinction Stage of Trajectories

Once the trajectory is beyond its peak, the total allele frequency decays as (I13)where denotes the maximal frequency that the trajectory reaches and Equation I13 is valid for Note that this stage only exists for frequencies At higher frequencies, the total allele frequency simply mirrors smoothed fluctuations in the founding class. Equation I13 can be straightforwardly integrated in time to obtain the contribution of this trajectory to the site frequency spectrum (I14)Averaging Equation I14 over all possible trajectories, we find that (I15)where is a constant that we have introduced to correctly account for the fact that the peak phase occurs at frequencies that are at least higher than the frequencies in the extinction stage.

For peak frequencies we have already calculated the overall time-integrated distribution of peak sizes of lineages arising in classes of all fitness and we can use this result to calculate the total probability that a trajectory passes through *f* in its extinction stage, (I16)This means that the contribution to the site frequency spectrum from the extinction phase of trajectories is equal to (I17)Therefore, is strictly smaller than by a factor which is large when Thus, this phase of the trajectory has a small effect on the low-frequency end of the spectrum.

However, in the high-frequency end of the spectrum, when the only contribution comes from this extinction phase of the wild type, which starts once the mutant approaches the frequency in the 0-class. These events happen at rate equal to and each contributes to the site frequency spectrum. Multiplying these two terms, we find that the site frequency spectrum is proportional to

(I18)### Contribution from the Spreading Stage of Trajectories

At frequencies the site frequency spectrum also receives contributions from the spreading stage of trajectories, in which the allele frequency rapidly increases as the allele spreads through the fitness distribution. In this stage, the rate at which the frequency increases is strictly larger than what it would be if we ignored any contributions from the founding class after the mutation exceeds frequency [*i.e.*, assuming ]: (I19)Far below the peak of the trajectory, where the contribution from this stage of a single trajectory to the frequency spectrum that passes through *f* is thus simply bounded by (I20)Since the number of trajectories that pass through frequency *f* in the spreading phase is the same number that pass through *f* in the extinction phase, the contribution from the spreading phase to the site frequency is strictly smaller than that of the extinction phase throughout the region where both contributions exist,

### Constructing a Single Curve from Piecewise Asymptotic Functions

In the previous sections of this appendix, we have shown that the site frequency spectrum is given by (I21)As we have explained, line 2 of Equation I21 is valid up to a constant factor that is bounded by These piecewise functions represent the leading-order behaviors far away from the transitions between the different regimes, which occur at and For practical purposes, it is often convenient to construct a single theoretical curve that joins these curves at these transition points, while maintaining the correct form far away from the transition points. This procedure is not intended to extend the validity of the results outside of the regimes where asymptotic forms are available and is certainly not guaranteed to produce the correct functional forms at the transitions. However, it often yields satisfactory results, especially when the transitions are narrow in practice and when the two asymptotic forms are expected to lie on opposite sides of the behavior at the transition (*i.e.*, one is expected to overestimate, and the other to underestimate). In the present case, the latter condition is true at the transitions at and

Here, we have used a sigmoid function, (I22)to join the functional forms at the transitions, which has the convenient property (I23)In addition to this, because the forms are valid when and have logarithmic divergences near the transitions (*i.e.*, for ), we also add small additive factors to these logarithms to avoid nonsensical results. Specifically, to compare our theoretical predictions with simulations, we plot (I24) and were chosen to ensure visual smoothness of the curve. Note that the constant is only necessary to ensure visual smoothness of the curve at limited λ (adding to the denominator to control the logarithmic divergence causes the curve to be shifted downward, and helps to correct for this). We tabulate the values used in Table I1.

In principle, we could also use a similar procedure to join the asymptotic forms at the transitions at and However, since both asymptotic forms overestimate the site frequency spectrum near these transitions, this works no better than simply setting (I25)This is the choice we have made when calculating theoretical predictions for site frequency spectra of smaller samples, which were necessary for comparisons with the structured coalescent.

## Appendix J: Distributions of Effect Sizes

When the effects of deleterious mutations are not all identical but instead have a distribution with finite width, the deterministic dynamics that arise through the combined action of mutation and selection will be modified. In this appendix, we consider these deterministic dynamics. For concreteness, we assume that the fitness effects of new mutations come from a gamma distribution with mean and shape parameter α, (J1)and that these deleterious mutations occur at an overall rate

Under the assumption that all mutations have strong enough effects on fitness that the fitness of the population at the locus does not experience Muller’s ratchet on timescales of coalescence, the mean fitness of an allele at the locus will be equal to with the most-fit individuals being those with no deleterious mutations and an absolute fitness equal to zero. Consider now the deterministic dynamics of a lineage founded in an individual at absolute fitness The fitness of the lineage founded by this lineage will change as it accumulates new deleterious mutations according to (J2)Evaluating this integral, we find (J3)When α is sufficiently large, corresponding to a sufficiently narrow fitness distribution, the resulting trajectory is well approximated by assuming that all fitness effects are the same and equal to the average fitness [or, more precisely, the harmonic mean of ]. To calculate how large α needs to be for this approximation to be valid, we can calculate the deterministic expectation for the average number of individuals in the lineage at time *t* after founding. This quantity is equal to (J4)We see that this differs from the single-*s* expression only in the last term, proportional to At sufficiently short times, this is well approximated by On sufficiently long timescales, this will not be the case. However, because the overall magnitude of this term becomes negligible at times long after the peak of we only need it to remain well approximated by an exponential on timescales which requires that When this is the case, is, up to perturbative corrections, given by (J5)and the effects of selection are well described by a single-*s* model on all timescales.

## Footnotes

Supplemental material available at Figshare: https://doi.org/10.25386/genetics.6167591.

*Communicating editor: N. Barton*

- Received April 20, 2018.
- Accepted May 25, 2018.

- Copyright © 2018 by the Genetics Society of America