In large populations, multiple beneficial mutations may be simultaneously spreading. In asexual populations, these mutations must either arise on the same background or compete against each other. In sexual populations, recombination can bring together beneficial alleles from different backgrounds, but tightly linked alleles may still greatly interfere with each other. We show for well-mixed populations that when this interference is strong, the genome can be seen as consisting of many effectively asexual stretches linked together. The rate at which beneficial alleles fix is thus roughly proportional to the rate of recombination and depends only logarithmically on the mutation supply and the strength of selection. Our scaling arguments also allow us to predict, with reasonable accuracy, the fitness distribution of fixed mutations when the mutational effect sizes are broad. We focus on the regime in which crossovers occur more frequently than beneficial mutations, as is likely to be the case for many natural populations.
IN a large, adapting population, beneficial alleles may be simultaneously spreading at multiple loci. These alleles will tend to arise in different lineages and compete with each other, slowing adaptation, an effect known as “(clonal) interference.” This phenomenon has been repeatedly observed in many different microbial and viral evolution experiments (Lenski et al. 1991; De Visser et al. 1999; Miralles et al. 1999; Colegrave 2002; Goddard et al. 2005; Hegreness et al. 2006; Desai et al. 2007; Bollback and Huelsenbeck 2007; Kao and Sherlock 2008; Pepin and Wichman 2008; Barrick and Lenski 2009; Betancourt 2009; Lang et al. 2011; Miller et al. 2011); recently, it has also been demonstrated to be occurring in natural viral populations (Batorsky et al. 2011; Strelkowa and Lässig 2012; Ganusov et al. 2013). Recombination alleviates interference by breaking down negative associations among the beneficial alleles; in fact, it has long been thought that this effect may be a reason for the evolution of sexual reproduction (Weismann 1889; Fisher 1930; Muller 1932). Although the effect of interference on the rate of adaptation in sexual populations has recently been the subject of a substantial amount of theoretical analysis (Cohen et al. 2005b, 2006; Rouzine and Coffin 2005, 2007, 2010; Neher et al. 2010; Batorsky et al. 2011; Neher and Shraiman 2011; Weissman and Barton 2012), this has mostly been restricted to considering interference among unlinked loci (i.e., with all loci reassorting independently at a uniform rate) or else limited to weak-to-moderate interference in which only rare alleles are affected.
However, for many real populations, particularly viral ones, interference may be both strong and primarily occurring among tightly linked loci, so that at any given time each polymorphic beneficial allele is simultaneously interacting with multiple other alleles at varying recombination fractions. In organisms such as viruses and eukaryotes in which recombination within chromosomes or genome segments occurs primarily via crossovers, these recombination fractions vary hugely among different pairs of loci. In humans, for example, loci at opposite ends of a chromosome are unlinked, with recombination rate r = 0.5, while pairs of loci within a gene may have recombination rates of <r ∼ 10−6 (Myers et al. 2005). Less is known about viral recombination rates, but Neher and Leitner (2010) estimate that in human immunodeficiency virus (HIV) recombination rates among loci vary by a factor of ∼103 over the genome. In this article, we analyze adaptation under strong interference in such populations with large ranges of recombination fractions among loci, focusing on populations in which crossovers occur more frequently than beneficial mutations. Recently, Neher et al. (2013a) conducted a very similar analysis for a complementary region of parameter space, with crossovers rare compared to selected mutations (see Model below).
We consider a well-mixed population of N haploid individuals. Individuals reproduce sexually, producing each offspring with a different mate. The genome consists of a single chromosome of map length R; i.e., there are an average of R crossovers per reproduction. (All of our results apply equally well to a population of facultative sexuals that outcross with frequency f < 1, with R as the map length multiplied by f; issues begin to arise only when the rate of fitness increase is ≥f2—see Appendix B.) There is a constant genomic beneficial mutation rate Ub, regardless of genetic background, so that beneficial mutations are never exhausted. We ignore deleterious mutations and epistasis among polymorphic beneficial mutations. With these assumptions, the population will approach an expected steady long-term rate of adaptive substitution per unit genetic map length, λ, and the rate of increase in mean log fitness, v; we focus on populations close to this steady state. It is useful to consider the rate of beneficial mutation per unit map length, μb ≡ Ub/R. (See Table 1 for definitions of symbols.) We focus on the case in which beneficial mutations are infrequent relative to recombination (μb < 1); Neher et al. (2013a) consider the opposite case.
Even in sexual populations, short stretches of genome will be effectively asexual—all loci within a sufficiently short stretch are likely to be described by the same genealogy tracing back to a single common ancestor. In other words, short stretches will typically coalesce without undergoing any recombination. Since each selective sweep increases the rate of coalescence via genetic draft, in rapidly adapting populations these stretches may be long enough so that multiple beneficial alleles will be almost completely linked for the entire time that they are polymorphic and will therefore strongly interfere with each other. On the other hand, once outside this stretch, the strength of interference decays rapidly, with a dependence on recombination rate approaching ∝ r−2 (Weissman and Barton 2012).
This combination of strong interference among very tightly linked alleles and weak interference among the rest of the genome suggests approximating the genome as consisting of a series of effectively asexual stretches that have relatively little effect on each other. This allows us to avoid the difficulties that come from dealing with an explicit model of crossovers between linear genomes. To understand the evolution of the short stretches, we can draw on the extensive theoretical work on the evolution of asexual populations in the strong-interference regime, particularly Desai and Fisher (2007), Rouzine et al. (2008), and Good et al. (2012).
Length of effectively asexual stretches, r×, and the rate of adaptation
We want r× to be the largest genetic scale over which the population evolves approximately asexually. In other words, mutations that arise at nearby loci separated by recombination fraction r < r× should be strongly associated with each other (either positively or negatively) until they fix or go extinct, while those at loci separated by r > r× should spread independently from each other, i.e., should be in linkage equilibrium by the time they become common and might start to affect each other. Figure 1 shows this pattern in a sample from a simulated population. Since new mutations start in strong linkage disequilibrium, which then decays at an average rate r, this is equivalent to requiring that beneficial alleles take a time ∼1/r× to become common, starting from a single copy.
The time for a new mutation to reach high frequency itself depends on the amount of interference. To find it, we make the approximation that loci farther than r× from each other are essentially unlinked. Then we can treat the genome as consisting of ∼R/r× independently evolving asexual “chunks,” each with beneficial mutation rate ≈μbr×. The evolution of each chunk can be described as a traveling wave in fitness space. For the most part, only the beneficial mutations that arise near the nose of the wave have a chance of reaching high frequency, and the time it takes for them to do so is roughly the time for the wave to travel the distance between its mean and its nose. Thus r× should be approximately equal to the speed of the fitness wave divided by its mean-to-nose width. Equivalently, looking backward in time, r× is set by the condition that the individual in the distant past carrying the ancestor of the present-day allele should be very likely to have very high fitness at loci within r ∼ r× of the focal allele but should have roughly average fitness elsewhere. In a traveling fitness wave, the time for a typical individual’s ancestry to trace back to the nose is the mean-to-nose width divided by the speed of the wave (Desai et al. 2013), giving the same value for r×.
To make this more precise, we focus on the case in which the selective coefficients of beneficial mutations cluster tightly around a typical value s; we consider the case of exponentially distributed effects below. We further focus on the case in which selection is strong relative to mutation, s > μbr×; we consider the biological plausibility of this assumption in the Discussion. With these assumptions, we can apply the traveling-wave analysis of Cohen et al. (2005a), Desai and Fisher (2007), and Rouzine et al. (2008) to the evolution of each chunk to find r× and λ self-consistently. We do this in Appendix A and find that they are approximately given by (1) (2)To cover a range of different reproduction models at once, we have written Equations 1 and 2 in terms of the scaled parameter γ ≡ 2Ns/Var, where Var is the variance in offspring number (Var = 2 and 1 in the Moran and Wright–Fisher models, respectively). Simulations confirm that Equation 2 accurately describes the rate of adaptation when interference is strong, λ0 ≡ μbγ ≫ 1 (Figure 2). Note, however, that some of the close match between the approximations and the simulations is due to lucky cancellations of error terms; see Appendix A and Appendix B, where we estimate the corrections due to interference among chunks.
For moderate interference, λ0 ∼ 1, we conduct a similar analysis in Appendix C and find the expression (3)This is similar to the approximation of Weissman and Barton (2012), λ ≈ λ0/(1 + 2λ0), but Figure 2 shows that Equation 3 is less accurate for λ0 ≲ 3. This is because interference among chunks has a significant effect in this regime; see Appendix B and Appendix C.
To derive Equations 1 and 2, we assumed that selection was stronger than mutation (s > μbr×). Equation 1 shows that this is consistent, given our earlier assumption that μb < 1. In the opposite case, μb > 1, in which individual mutations are only weakly selected, Neher et al. (2013a) follow a very similar approach to find a result analogous to Equation 1, with in our notation. (In this regime, the chunk asexual traveling waves are described by Cohen et al. 2005a and Hallatschek 2011 rather than by Desai and Fisher 2007 and Rouzine et al. 2008.) Good et al. (2013) also consider the genetic diversity in this case, using a similar approach. It is surprising that the relative strength of mutation to selection depends more on the frequency of recombination than on the strength of selection.
Exponentially distributed mutational effects
We now consider the case in which the effect of a beneficial mutation, s, rather than being a fixed value, is drawn from an exponential distribution with mean We can take a similar approach as above, but now the evolution of the approximately asexual chunks is described by the analysis of Good et al. (2012), rather than Desai and Fisher (2007). The density of adaptive substitutions λ is not a very useful quantity in this case, since the substitutions will have a range of effects, so we instead focus on v. Given r×, the width and speed of the fitness wave of each chunk can be found using equations 13 and 14 from Good et al. (2012), which we reproduce in Appendix D, using our notation. These values can then be plugged back into Equation A1 (with s replaced by ) to solve for r× self-consistently.
In the strong-interference regime (), solving the system gives a simple approximate expression for the rate of adaptation (see Appendix D), (4)which matches well with simulation results (Figure 3). It is interesting that Equation 4 is much simpler than the corresponding ones both for a population with fixed selective coefficients (Equation 2) and for an asexual population with exponentially distributed coefficients (Good et al. 2012).
Besides the rate of adaptation, we can also find the distribution of fixed effects of mutations, using equation 11 of Good et al. (2012). Figure 4 shows that this gives a rough match to simulation results, although the probability of fixation of small-effect mutations is underestimated. This may be due to inaccuracies in our approximations or in the original asexual equation (Fisher 2013). Both the analytical and the simulation results indicate that over most of the simulated parameter range successful mutations generally have large selective coefficients but arise on average genetic backgrounds; only at the highest simulated μb values (those shown in Figure 4) do the backgrounds contribute significantly to the fitness of successful mutants (see Appendix D).
We want to check what neutral genetic diversity should look like in a population evolving under the dynamics described above. The expected pairwise coalescence time at a neutral locus in a traveling fitness wave is approximately twice the time for the wave to go the distance from its mean to its nose (Desai et al. 2013), i.e., ≈2/r×. The neutral nucleotide diversity (the expected number of neutral differences between a random pair of genomes) should therefore be π ≈ 4Un/r×, where Un is the neutral mutation rate. Plugging in the value of r× from Equation 1, this is (5)Given that π is proportional to N in a neutrally evolving population, it may seem surprising that Equation 5 depends on the population size only very weakly, through a double logarithm. This is a consequence of the fact that the speed and length of the asexual traveling wave have nearly the same dependence on N, so the coalescence time, which is given by their ratio, is nearly independent of N (Desai et al. 2013).
From simulations, Equation 5 appears to have the correct scaling with the parameters, but consistently overestimates π by ∼50% (see Figure 5). This difference is not so surprising, given that our derivation of r× was only approximate. The inset in Figure 5 shows that Equation 5’s accuracy does appear to improve for low values of Ub/s, although not by much. Part of the inaccuracy may be because the pairwise coalescence time of a traveling wave is equal to twice the nose-to-mean time only in the limit of very wide waves (Desai et al. 2013); in our case, the width is ≈λ + 1 (see Appendix A), which is never very large.
Going beyond just the nucleotide diversity, Figure 6 shows the full one-locus site frequency spectrum. Very wide traveling waves are expected to approach a Bolthausen–Sznitman coalescent (Desai et al. 2013), giving a site-frequency spectrum with the characteristic scaling ∝ ν−2 as the derived allele frequency v approaches 0 and ∝ [(ν − 1)log(1 − ν)]−1 as v approaches 1 (Neher and Hallatschek 2013). Again, the chunk waves are never very wide, so we would not expect this to be a very good approximation in our case; however, the scaling appears to be accurate, particularly for ν → 1. Surprisingly, for ν → 0, the simulations with the lowest values of λ (i.e., the narrowest chunk waves) are the closest to the Bolthausen–Sznitman scaling.
We also want to consider linkage disequilibria among loci. Specifically, we look at the squared “standard linkage deviation,” defined for a pair of loci with mutant allele frequencies ν1, ν2 and double-mutant haplotype frequency ν12 as (6)(Ohta and Kimura 1969). (Compared to other measures of linkage disequilibrium, has the advantage of being relatively insensitive to associations among rare alleles and easy to calculate analytically.) Figure 7 shows that between loci in simulations decays with the recombination fraction r between them, but does so much more quickly for r > r× than for r < r×, suggesting that r× is indeed an appropriate scale. Figure 7 also shows that the pattern of is very different from that of a neutral Wright–Fisher population with size Ne = π/(2Un) (Ohta and Kimura 1969, equation 18). In particular, appears to decay like 1/r2 for r > r×, while for a neutral population it falls off only as 1/r. This is to be expected, since the underlying coalescent process is also very different. Note that while this contrasts with Zeng and Charlesworth’s (2011) finding that the effect of linkage disequilibrium (LD) of background selection on deleterious mutations could be accounted for by adjusting Ne in this way, they considered only the regime in which the deleterious alleles are in linkage equilibrium with each other, and it is not clear whether this result extends to the regime in which the deleterious alleles interfere with each other [“weak selection Hill–Robertson interference” (McVean and Charlesworth 2000; Kaiser and Charlesworth 2009)].
To check the accuracy of our approximations, we conducted individual-based simulations. Simulated populations reproduced according to the Wright–Fisher model. Individuals were obligately sexual (with selfing allowed), and each genome consisted of a single linear chromosome with uniform crossover. The average rate of crossover was R = 1 per genome per generation for all simulation data shown here. There was a constant supply of beneficial mutations and no back mutations. In the simulations used to study the site frequency spectrum and the dependence of linkage disequilibrium on recombination fraction, there was also a constant supply of neutral mutations. For population sizes N ≤ 106, the genome was modeled as continuous, with an (effectively) infinite number of loci. For larger N, this required too much memory, and the genome was instead modeled as having 500 evenly spaced loci, each with an infinite number of possible alleles. The recombination fraction between adjacent loci in this model was small compared to the predicted chunk length r× for all simulated parameter values, with the ratio between the two reaching a maximum of ≈0.2 for λ0 = 105 in Figure 2. Even this discrete-locus model became very memory and computation intensive for large populations; N = 107 was already pushing the limits of our hardware.
Simulating populations with exponentially distributed mutational effects was particularly difficult. The mean effect had to be kept small to avoid having a substantial fraction of fixed mutations with very large selective coefficients, s ∼ (1), when interference was strong. (Our approximations assume s ≪ 1 throughout.) Figure 4 shows that with our simulations were already beginning to approach this regime for λ0 ≳ 100. But for very small values of mutations took a long time to sweep through the population, increasing the memory usage of the simulations. Even with population sizes of >N = 106 undergoing strong interference were computationally impractical. Simulations were therefore limited to a fairly narrow range of parameters.
When biological adaptation is controlled by a combination of several evolutionary forces with widely varying strengths, it is important to have simple order-of-magnitude estimates of which combinations of forces are important and how they are important. In this article, we have examined well-mixed populations in which adaptation is driven by the interaction of beneficial mutation, recombination (via crossover), selection, and genetic drift. We have hypothesized and checked by explicit simulations that, restricted to a suitably chosen local genomic scale, the dynamics of this sexual case reduce to known results for asexual adaptation. The whole genome can be thought of as being subdivided into uncorrelated chunks, each evolving effectively asexually. We determined the characteristic chunk length self-consistently by requiring that the local coalescence times are just about long enough for neighboring chunks to become decorrelated through recombination. Despite the simplicity of our approximation, the resulting predictions for the speed of adaptation, linkage disequilibrium, and the distribution of fixed mutational effects compare well with the simulations.
Fisher (1930, p. 123) noted that the potential increase in the rate of adaptation of sexual populations over asexual ones is given by “the number of different loci in the sexual species, the genes in which are freely interchangeable in the course of descent.” To understand when this is relevant to evolution, it is necessary to understand under what circumstances this maximum potential increase is approached, given a fixed number of recombining loci (Maynard Smith 1971; Kim and Orr 2005). Here we have focused on another aspect of the problem, considering populations adapting at their maximum possible rate and investigating how “the course of descent” and the number of “freely interchangeable” genes interact to determine each other. Park and Krug (2013) take a hybrid approach, considering two asexual loci, each experiencing strong clonal interference, and investigating how frequently recombination between them needs to occur for them not to interfere with each other. They find that the rate of adaptation slowly increases over a broad range of recombination rates (more than three orders of magnitude for some parameter combinations). This suggests that in their model successful mutations at each locus have to occur in individuals that are also highly fit at the other locus, with the required fitness slowly decreasing with increasing recombination rate. The dependence of the rate of adaptation on the rate of recombination is much weaker than the nearly linear relation found in our model.
We have already discussed the connections between our analysis and the closely related work by Weissman and Barton (2012) and Neher et al. (2013a), who examine the same question in the parameter regimes μb ≲ 1/(Ns) and μb > 1, respectively. Our analysis bridges the gap between these two analyses, focusing on the case 1/(Ns) < μb < 1 in which the density of beneficial mutations μb = Ub/R is large enough so that interference among them is strong, but not so large that it overwhelms the effect of selection on individual mutations. For large populations, this is a broad region of parameter space. (Note that N in the condition above is the short-term effective population size, which may be many orders of magnitude larger than the long-term effective size Ne measured from heterozygosity—see Figure 5.) Which natural populations might plausibly fall inside it? Almost all obligatorily outcrossing organisms certainly satisfy the condition μb < 1, since they typically have total mutation rates on the same order as rates of crossover, and only a small (albeit often unknown) fraction of those mutations are beneficial. However, many of them are likely to have sufficient recombination that interference among beneficial mutations is negligible, 1/(Ns) > μb (Weissman and Barton 2012).
Organisms with lower rates of outcrossing, such as viruses and selfing and facultatively sexual eukaryotes, are more likely candidates. However, very little is known about natural rates of recombination for most of these species. It is difficult to directly measure short-term recombination rates in natural conditions, and rates inferred from diverged genomes measure some convolution of recombination and selection on recombinants, rather than recombination itself.
Bacteria, for which recombination occurs primarily via the exchange of short stretches of DNA rather than crossovers, are not described by our model (unless the rate of exchange at each site is large compared to the coalescence time, which seems unlikely). Instead, these populations are described by the analyses of Neher et al. (2010) and Neher and Shraiman (2011), which assume that all loci have approximately the same recombination probability with each other. Interference is thus similar to that among unlinked loci in our model (see Appendix B), with the same parameter 4v/f2 controlling the strength. For organisms that primarily recombine via crossovers and also have 4v/f2 ≫ 1, both forms of interference are likely to be important.
Of the organisms with limited outcrossing, HIV evolving within a host has perhaps the best-characterized natural mutation and recombination rates. Neher and Leitner (2010) estimate that in chronic infections the recombination rate is ∼10−5 per base. Interestingly, just as in obligate sexuals, this is on the same order as the per base mutation rate (Abram et al. 2010), implying that μb ≪ 1. The rate of substitutions is also on the same order, ranging up to ∼5 × 10−5 per base, depending on the gene and stage of infection (Shankarappa et al. 1999). If a substantial fraction of the substitutions are adaptive (as must be the case when the substitution rate exceeds the mutation rate), the density of adaptive substitutions is high enough that HIV is in the strong-interference regime described by our model, with 1/(Ns) < μb. If the selective advantages of the adaptive substitutions are on the order of 1% (Neher and Leitner 2010), our model predicts that the genome (with length ≈10 kb) consists of tens of effectively asexual chunks with lengths of hundreds of base pairs.
The above back-of-the-envelope calculation should not be taken too seriously. Our model leaves out many features that are likely to be important to adaptive evolution, both of HIV and more generally. Most obviously, we ignore deleterious mutations, which are likely to make up the vast majority of all selected mutations. It is unclear whether our results still apply when the majority of fitness variance is due to deleterious mutations rather than sweeps. We also ignore weakly selected beneficial mutations. If nearly all mutations are selected (with most being deleterious or weakly beneficial), then the total selected mutation rate might be on the same order as R or even larger. If sweeps are rare, then this situation is covered by Neher et al.’s (2013a) approach, but if sweeps are also common in addition to background selection, a combination of their method and that of this article may be necessary.
In addition to deleterious and weakly beneficial mutations, we also ignore several other factors that are likely to be important. In large populations, adaptation may be driven by selection on standing variation due to environmental change rather than by new mutations (Hermisson and Pennings 2005), in which case the amount of interference among sweeps depends on how long the alleles were present in the population as neutral or deleterious variation. Population structure can also affect the amount of interference, as it tends to slow down selective sweeps, lowering the threshold rate at which they begin to overlap and interfere (Martens and Hallatschek 2011). We also ignore epistasis, which can overwhelm recombination and preserve linkage disequilibrium if it is strong enough (Neher and Shraiman 2009).
In the light of our results, we may revisit Weismann’s hypothesis that sex is selected for because recombination reduces clonal interference and thus speeds up adaptation. Our model generally supports this hypothesis, as the speed of adaptation is predicted to be roughly proportional to R, as can be seen from Equation 2. However, we have not investigated whether this can effectively select for a modifier allele increasing recombination. Note also that the speedup due to sex arises only if the total map length R is larger than the characteristic chunk size r×; otherwise the whole genome is effectively asexual and recombination is too rare to have a significant effect. If we consider a facultative sexual such as yeast, is it plausible that recombination is frequent enough to substantially speed up adaptation? Assuming s ∼ 1% and Ub ∼ 10−5, as Desai et al. (2007) measured for Saccharomyces cerevisiae in a laboratory setting, Equation 1 gives a minimum value of R ∼ 2 × 10−3, roughly independent of N. Given that S. cerevisiae undergoes ≈43 crossovers per mating, this corresponds to a minimum frequency of sexual reproduction of ∼4 × 10−5. Thus, even small, difficult-to-measure rates of sex may be effective in alleviating Hill–Robertson interference.
It is somewhat surprising that our mean field approximation based on a typical block length worked, as it does not take into account fluctuations in the chunk lengths. This may be in part due to a negative feedback: if an anomalously strong clone arises at one location, it leads to a larger linkage block. This in turn will increase the interference among local sweeps, thus reducing the local density of beneficial sweeps. Overall, these effects tend to push block sizes toward a mean block size, as assumed in our argument. Note, however, that for broad distributions of fitness effects we observe significant deviations from our simple predictions.
We thank Nick Barton and Richard Neher for helpful discussions. D.B.W. received financial support from European Research Council grant 250152. O.H. received financial support from the German Research Foundation (Deutsche Forschungsgemeinschaft), within the Priority Programme 1590 “Probabilistic Structures in Evolution,” grant HA 5163/2-1.
Solving for r× and λ
In this Appendix, we show how to use asexual traveling-wave theory to determine the density of adaptive substitutions λ and the length r× of the effectively asexual chunks. We follow Desai and Fisher’s (2007) intuitive argument for simplicity; the same results can be derived from their formal calculations or from those of Rouzine et al. (2008). For each chunk, define q× as the difference between the number of beneficial mutations in the fittest genotype in the population and the number in the average genotype. Typically, this fittest genotype will be present in no more than a few copies and will most likely be lost to stochastic drift. However, there will usually be a genotype with q× − 1 mutations that has established and is starting to sweep through the population. From the time when the genotype first establishes to when it reaches high frequency (i.e., the nose-to-mean time of the fitness wave) takes tnm ∼ 2 log(q×γ)/(q×s) generations (Desai and Fisher 2007). (The factor of 2 comes from the fact that its mean selective advantage drops from ≈(q× − 1)s to ≈s over the course of the sweep.) The length of the chunk of genome that stays linked over the course of the sweep is ∼1/tnm; this sets r×:(A1)
Since q× − 1 mutations fix every 1/r× generations in every chunk of genome of length r×, the density of adaptive substitutions is simply (A2)and we can write everything in terms of λ, the quantity we want to find. Doing this, our expression for r× is(A3)
To determine λ and r×, we can use the additional fact that to maintain a steady wave in fitness space, in each chunk new mutations must be establishing at the same rate that they are fixing, λr×. In other words, the chunk with λ mutations should produce an established chunk with an additional mutation in t1 ∼ 1/(λr×) generations. The number of copies of the chunk with λ mutations t < t1 generations after it establishes is nλ(t) ≈ exp(λst)/((λ + 1)s). The total number of mutant genotypes it produces is Each of these mutants has a probability ∼(λ + 1)s of establishing, so to have one successful mutant we must have Evaluating the integral, we find the following condition: (A4)Equations A3 and A4 can be rearranged to giveExpanding about λ ≈ 1 and dropping (1) terms in large logarithms gives Equations 1 and 2 in the main text. The difference between the numerical solution to Equations A3 and A4 and the analytical approximation is negligible for strong interference; see Figure A1.
Effect of Background Genetic Variation at r > r×
We have assumed that each chunk of genome evolves roughly independently. To confirm this, we need to understand how selection on variation at r > r× changes Equations A3 and A4, i.e., how the rest of the genome affects the dynamics of a chunk that starts in the nose of the fitness distribution and sweeps to fixation. We do just a rough, approximate analysis, but even this gets quite involved.
As a first step, we rewrite Equations A3 and A4 in a form that makes it clearer how they might be affected by variation in backgrounds: (B1) (B2)Here, ssweep is the mean selective advantage of genomes carrying the focal mutation over the course of the sweep, s1 is their mean selective advantage up to the point where they produce an additional successful mutation, and is the probability that a given additional mutation successfully establishes. nest is the establishment size, which is roughly the number of copies of a successful chunk after ∼test = 1/((λ + 1)s) generations. (See Desai and Fisher 2007 for a detailed discussion of the establishment dynamics.) All of these quantities are potentially affected by background genetic variation, but we will see that and nest, the ones that depend on the shortest timescale test, are the most affected.
It is helpful to divide the genome into the region immediately around the chunk and the rest and consider these two sets of loci separately. Tightly linked loci close to the chunk can stay linked for an extended period of time and can be seen as perturbing the chunk’s mean fitness. Variation at more distant loci is rapidly shuffled by recombination and effectively increases the variance in offspring number in a way that is uncorrelated over the timescales relevant for selection and mutation. To find the strength of these effects, we first note that the density of variance in log fitness over the chromosome is v/R = λs, so the standard deviation in log fitness due to loci at a recombination fraction ≲ r from the focal locus is (until r saturates at f/2 for unlinked loci).
Tightly linked loci
First, consider the effect of the fitness of the initial chunk genotype at loci not too far away from the focal chunk. For these to have a significant effect on average, their effect on fitness needs to be at least comparable to the chunk’s selective coefficient. In other words, if σ(r) < λs, then the region within r typically does not contain enough variation to affect the dynamics. Thus, we need to average over a region of width at least ∼λs. This is also approximately the maximum scale over loci that are effectively tightly linked, as it is roughly both the region that stays linked over the shortest relevant timescale test and the region over which selection is strong enough relative to recombination to maintain unusually fit combinations; i.e., σ(r) ≫ r for r ≪ λs and σ(r) ≪ r for r ≫ λs (Neher et al. 2013b).
Thus, the tightly linked background will usually have a combined selective coefficient on the same order as the chunk’s own. More precisely, during the time test over which a new chunk with λ + 1 mutations will either establish or go extinct, it will be strongly associated with a particular genetic background with log relative fitness drawn from a normal distribution with standard deviation This distribution has initial mean ≈−λs2 ≈ 0 (Weissman and Barton 2012, supplementary text 2), dropping to ≈−2λs over the time test due to the increase in the population mean fitness. Assuming that the probability of establishment is proportional to the average mean fitness of the chunk, averaging over this distribution of backgrounds reduces the mean probability of establishment by 30–40% for λ ≳ 1 (Figure B1).
The dynamics of establishment also enter into Equations B1 and B2 through the mean log establishment size nest. Averaging over possible backgrounds weighted by their probability of establishing, we find that E[log(nest)] increased by ≈log(1.3−1.4) for λ ≳ 1 (Figure 1). Both this correction and the one to have very small effects on the results. In fact, in Equation B2, the two corrections approximately cancel each other: each mutant’s probability of establishment is lower, so the lineage must produce more mutants before one is successful, but because the lineage is larger, it naturally does so. [Note that the background variation at r ∼ (λ + 1)s that determines nest has largely been lost by the time the lineage starts producing new mutants, so it does not affect these new mutants’ ) In Equation B1, the increase in E[log(nest)] can be included by adjusting γ.
Note that these tightly linked loci do not have a significant effect on timescales that are long compared to test: associations with initially poor or average backgrounds are reduced by recombination to a small enough region of the genome that σ(r) is small compared to the chunk’s selective advantage, and associations with unusually good backgrounds are washed out as the population mean fitness catches up. Thus ssweep in Equation B1 and s1 in Equation B2 are only slightly affected.
Loosely linked loci
We now consider the effect of loosely linked loci at r > (λ + 1)s. Their dominant effect is also on nest, because they recombine away too rapidly to have much effect on the quantities ssweep and s1 that depend on longer timescales. In this case, we can use a generalization of an heuristic argument originally due to Robertson (1961) to obtain the amount of interference to a focal locus from a locus separated by a recombination fraction r with log fitness variance σ2, with r ≫ σ. An allele at the focal locus will typically have its log fitness perturbed by an amount ∼σ due to variation at the interfering locus. If each generation, each allele moved to a random background, the introduced variance in log fitness would just be σ2 [and the additional variance in offspring number would be exp(σ2)]. But the LD between the loci decays at a finite rate r, so a perturbation of size σ will still have an average residual effect ≈ σ exp(−rt) after t generations. The accumulated perturbation after t generations is therefore The accumulated variance in log fitness is the square, σ2(1 − exp(−rt))2/r2. (Note that the variance in log fitness is highly correlated on timescales small compared to 1/r, so the integration over time comes before squaring.) As t becomes large, this approaches σ2/r2, Robertson’s result. Applying this to unlinked loci in our model, we see that they increase the variance in offspring number by a factor exp(4v/f2), which is negligible for most of the simulated parameters.
Loosely linked loci have a somewhat larger effect. Summing over all loci at r > λs and using that the density of log-fitness variance is λs, the total effect isNote that for these loci, the limited time test over which they can affect nest is taken into account. (This cutoff has only a modest effect; without it, the coefficient of λ in the numerator would just be 2.)
Combined effect and limitations of results
Combining the effects of tightly linked, loosely linked, and unlinked loci found above, the total effect of the background variation can be accounted for by adjusting the definition of γ, (B3)where γ0 is the baseline value, γ0 = 2Ns/Var. The term −4v/f2 is the only way in which our results depend directly on f, rather than R; it is negligible over almost all of the simulated parameter space. (When it becomes large, the dynamics become sensitive to the details of the organism’s life cycle; see Weissman and Barton 2012.) The initial factor of 0.7 in Equation B3, coming from the effect of tightly linked loci, is included assuming that λ ≳ 1; for lower λ, it approaches 1. It also has only a small effect for the regime λ ≳ 1 where it applies.
In fact, over most of the parameter regime of interest, even the factor exp[−1.5λ/(λ + 1)] has only a small effect on the results; see Figure B2. This is because generally either λ is small (so γ ≈ γ0) or there is only a weak dependence on γ. We include the correction in all numerical calculations, but omit it in all analytic approximations. (It is of the same order as other small, omitted terms.) The only region of parameter space where the effect is notable is for γμb ≈ 1, where λ ∼ 1 but is still fairly sensitive to γ. We consider this regime of moderate interference, λ0 ∼ 1, in Appendix C below. Even in this case, leaving out the interference from loosely linked loci would increase the predicted λ only by about a factor of 2.
We calculated Equation B3 by splitting the genome into the regions r < (λ + 1)s and r > (λ + 1)s and analyzing these, assuming tight and loose linkage with the focal chunk, respectively. However, for both regions the dominant effect comes from loci with r(λ + 1)s, where neither approximation is very accurate. The exact form is therefore likely to be wrong, but in any case our analysis shows that the errors in Equations 1 and 2 from ignoring interference among chunks are small, especially compared to the errors introduced by applying our idealized model to any real population.
Equations 1 and 2 for r× and λ are based on the analysis of a stable fitness wave, which is valid for q× ≫ 1. For q× ≈ 2, the wave is always fluctuating, and this analysis is invalid. Desai and Fisher (2007) also investigate this case and find that the time for a new mutation to arise and fix is dominated by the waiting time for a mutation to establish on a good genetic background, which is given by their equation 46: (C1)The value 1/t2 is the rate at which mutations fix in each chunk, 1/t2 = λr×. r× is still set by the inverse of the sweep time, which is now just the standard tsweep = log(γ)/s. (Desai and Fisher 2007 find that the initial boost from arising on a good background makes little difference, as that background is itself rapidly approaching fixation.)
Plugging r× = s/log(γ) into Equation C1 givesfor γ ≫ 1 and γμb ≈ 1, the regime we are considering. The density of substitutions is therefore(C2)
As mentioned in Appendix B, the effect of loosely linked loci outside the chunk must be take into account in this case via Equation B3. (The effect of truly unlinked loci, with recombination fraction f/2, is still negligible, as is the initial factor of 0.7 due to tightly linked loci.) Substituting this into Equation C2, we find the implicit equationthe solution of which is well approximated by(C3)
Exponentially Distributed Effects
Given the length r× of an effectively asexual chunk of genome, we can characterize its evolution by q×, defined in this case as the typical relative fitness of the fittest chunk genotype divided by and v× = vr×/R, the rate at which each chunk’s mean fitness is increasing. q× and v× can be found as functions of r×, using equations 13 and 14 of Good et al. (2012), which are, in our notation, (D1) (D2)where The solution to these equations can then be used with Equation A1 to determine r×. For q× ≫ 1, the rate of advance can be written simply as giving the very rough but simple solutionfrom which r× and v× follow directly. This form can be guessed just from Equation D2—the large factor of γμb out front must be roughly canceled by the exponential factor exp(−q×).
Comparing the lead q× (shown in Figure 3) to the fixed fitness distributions in Figure 4, we see that only at very high mutation rates is q× much larger than the advantage conferred by a single typical successful mutation.
Communicating editor: J. Wakeley
- Received December 16, 2013.
- Accepted January 11, 2014.
- Copyright © 2014 by the Genetics Society of America