The Rate of Fitness-Valley Crossing in Sexual Populations
Daniel B. Weissman, Marcus W. Feldman, Daniel S. Fisher


Biological traits result in part from interactions between different genetic loci. This can lead to sign epistasis, in which a beneficial adaptation involves a combination of individually deleterious or neutral mutations; in this case, a population must cross a “fitness valley” to adapt. Recombination can assist this process by combining mutations from different individuals or retard it by breaking up the adaptive combination. Here, we analyze the simplest fitness valley, in which an adaptation requires one mutation at each of two loci to provide a fitness benefit. We present a theoretical analysis of the effect of recombination on the valley-crossing process across the full spectrum of possible parameter regimes. We find that low recombination rates can speed up valley crossing relative to the asexual case, while higher recombination rates slow down valley crossing, with the transition between the two regimes occurring when the recombination rate between the loci is approximately equal to the selective advantage provided by the adaptation. In large populations, if the recombination rate is high and selection against single mutants is substantial, the time to cross the valley grows exponentially with population size, effectively meaning that the population cannot acquire the adaptation. Recombination at the optimal (low) rate can reduce the valley-crossing time by up to several orders of magnitude relative to that in an asexual population.

MOST phenotypes depend on interactions between many different genes. For any given trait, there are likely to be many possible adaptations that involve combinations of mutations at different loci. Examples include signaling pathways that require coevolution of receptors and ligands (Goh et al. 2000), pathogens that may require several mutations to escape their hosts' immune response or to become resistant to drugs (Levin et al. 2000), and bacteria that require multiple mutations to evolve the ability to metabolize a new nutrient (Blount et al. 2008). While such complex adaptations may be acquired through successive individually beneficial mutations (Bridgham et al. 2006), it is likely that some adaptations require a combination of mutations that are each individually neutral or deleterious in the absence of the other mutations. Using the fitness landscape metaphor, populations must cross a “fitness valley” to acquire such adaptations. Valley crossing provides a means for populations to escape local “fitness peaks” or “plateaus” where no single mutation is individually beneficial. It may also be a common mode of adaptation even when individually beneficial mutations are available, especially in large populations (Weissman et al. 2009), as is typical of microbes or Drosophila.

Understanding the dynamics of crossing fitness valleys is particularly important for understanding pathogen evolution. An adaptation that requires crossing a fitness valley should generally be more difficult for a population to acquire than an adaptation that can be reached by going straight “uphill.” For a method of combating a pathogen to be effective, it must be difficult for the pathogen to adapt to it; this suggests that treatments should typically confront the pathogen with a fitness valley. Similarly, for an attenuated strain of a virus to be useful for a vaccine, it must be difficult for it to revert to virulence, so one would like the attenuated genotype to be separated from pathogenic genotypes by fitness valleys. But as we will see, in many parameter regimes fitness valleys can be crossed rapidly, so it is important to understand how wide and deep a valley must be for a particular large population to be highly unlikely to cross it.

Intuitively, recombination affects the valley-crossing process in two opposing ways. First, recombination can speed up valley crossing by bringing together separate neutral or deleterious mutations present in different individuals, perhaps producing a beneficial combination. Second, recombination can slow down valley crossing by breaking up beneficial combinations of mutations once they have been formed. In this article, we provide a quantitative description of the dynamics by which a sexual population crosses a fitness valley, focusing on the effect of recombination on the expected time this takes. We find that over a broad region of parameter space, a small amount of recombination speeds up valley crossing, while a large amount slows it down, often drastically, with the scale for the recombination rate at which the behavior changes set by the selective advantage obtained by crossing the valley.

In general, the dynamics of complex adaptation will depend on the population size, the mutation rates, the recombination rates between the different loci, the selective advantage provided by the adaptation, and the selective disadvantages of each of the intermediate genotypes. The population size will generally be very large and the mutation rates very small compared to unity, but both vary over many orders of magnitude between different populations and species. Recombination rates and selection coefficients typically range over many orders of magnitude among different sets of loci within one organism. Given this wide parameter space, the potentially subtle interactions between very large and very small parameters, and the difficulty of accurately measuring these underlying parameters in real populations, arriving at a detailed theoretical picture of all possible dynamics by simulations is neither practical nor particularly useful. Here we focus on the simplest possible model for understanding valley crossing with recombination: a population with a single possible adaptation requiring one mutation at each of two different loci, with both mutations individually neutral or deleterious. Even in this simple case, a complete description is quite complicated, and exact mathematical expressions for dynamics cannot, generally, be obtained. However, given the inaccuracies involved in applying our simple mathematical model to any biological system, the increased precision given by exact expressions is of limited usefulness. On the other hand, given the large range in possible parameter values mentioned above, approximate expressions giving the dependence of quantities of interest on the parameters can be very informative. We focus on developing an intuitive understanding of the major features of the dynamics, emphasizing aspects that are insensitive to the details of the model and that can serve as a basis for understanding more complicated situations. Detailed calculations are provided in appendix b.

Our analysis of crossing simple fitness valleys provides a basis for understanding more complicated fitness valleys, as Weissman et al. (2009) did for asexual populations. But it also serves as one step toward understanding how recombination and other population parameters act in general fitness landscapes to determine the rate and mode of adaptation. It is not known (even in simple models) under what circumstances adaptation proceeds primarily via the fixation of alleles that provide a selective advantage in a wide range of genetic backgrounds or when, instead, adaptation primarily involves formation and selection of particularly advantageous combinations of alleles. Neher and Shraiman (2009) recently analyzed the interplay between selection on alleles and selection on complicated combinations of alleles in populations with high diversity initially, but no new mutations. Their results share with ours the feature that which type of adaptation dominates is controlled by the relative magnitudes of the selective advantage of a combination of alleles and the recombination rate among the loci involved.

The problem of the acquisition of an adaptation requiring individually deleterious mutations at two loci was analyzed in the limit of infinite population size by Haldane (1931), Crow and Kimura (1965), Eshel and Feldman (1970), and Karlin and McGregor (1971), who found that a sufficiently high recombination rate made adaptation impossible. Michalakis and Slatkin (1996) considered several combinations of population parameters for finite populations and found that recombination generally slows valley crossing, while noting that recombination can have a positive effect if selection against single mutants is much weaker than selection for double mutants. Christiansen et al. (1998) quantified the increase in the production of double mutants caused by recombination in the case of neutral single mutants, while for deleterious single mutants and weak recombination Hadany (2003) calculated the expected time for the population to cross the valley. In the limit of extremely weak recombination, the population becomes effectively asexual, a case that has been analyzed by Komarova et al. (2003), Iwasa et al. (2004), Weinreich and Chao (2005), Durrett and Schmidt (2008), Weissman et al. (2009), and Lynch and Abegg (2010). Simulations have usually found that recombination reduces the rate of valley crossing (e.g., Takahata 1982; Kim 2007), although some have found a slight increase for small amounts of recombination (Takahasi and Tajima 2005; Weinreich and Chao 2005; De Visser et al. 2009). In this article, we provide a quantitative framework tying together these previous results and describe the dynamics in new regions of parameter space. We also describe the parameter range for which the general analysis of the rate of transitions between local fitness peaks in Barton and Rouhani (1987) is valid and apply this analysis to our case. While we are primarily concerned with the rate of adaptive evolution, the problem of purely compensatory evolution (where the single mutants are deleterious but the double mutant has the same fitness as wild type) is closely related and has been analyzed by, e.g., Kimura (1985), Stephan (1996), and Carter and Wagner (2002). We do not consider the effect of environmental change on valley crossing; this has been investigated by, e.g., Masel (2006) and Kim (2007). While this article was being revised, another article (Lynch 2010) examining similar questions was published; we compare those results with ours in the discussion.


We consider a haploid population of N individuals with two diallelic loci with recombination rate r between them. Genotype ab is initially fixed in the population, with allele a mutating to A at rate μA and b mutating to B at rate μB. (We neglect back mutation, as it will have only a minor effect on the dynamics in the interesting regions of parameter space.) The four possible genotypes have fitnessesMathwith δA, δB ≥ 0 so that the mutant alleles are individually neutral or deleterious, and 1/Ns ≪ 1, so that the double mutant has a substantial but not enormous selective advantage over the wild type (Figure 1). For finite N, the fittest genotype AB will eventually dominate the population. We wish to find the distribution of the time 𝒯 that it takes for this to occur, in particular the expected time τ ≡ E[𝒯]. Except in very large populations, the time that it takes for AB to sweep to fixation in the population will be a negligible component of τ, which will be dominated by the waiting time before the selective sweep begins (Weissman et al. 2009) We therefore ignore the sweep time and refer to the beginning of the sweep (when the double mutants establish and begin to increase in number exponentially) as the point at which the population has crossed the valley.

Figure 1.—

The fitnesses of the four genotypes. The wild-type ab has fitness 1, and the double-mutant AB has fitness 1 + s > 1. The single mutants Ab and aB are no more fit than wild type, with fitnesses 1 − δA ≤ 1 and 1 − δB ≤ 1, respectively. Allele a mutates to A at rate μA, and b mutates to B at rate μB.


We find that the time τ for the population to acquire the adaptation is usually minimized at an optimum level of recombination rs, which balances the two effects of recombination on valley crossing: bringing together single mutants and breaking up double mutants. For higher recombination rates, rs, recombination effectively breaks up the double-mutant combination; thus double mutants can increase in frequency only along with single mutants, slowing down the former's spread (Figure 2). In this case, if single mutants have a significant fitness disadvantage, then these mutants must drift against selection to the frequency at which they can begin to take over the population. The probability of this happening decreases exponentially as N increases, so τ increases exponentially: this means that large populations effectively cannot acquire adaptations that require individually deleterious mutations at unlinked genes. For less deleterious single mutants, mutation and drift are strong enough to allow the mutants to reach the necessary frequency relatively easily. In this regime, the two effects of recombination approximately cancel, and τ is close to its asexual value τr=0, which decreases with increasing N—the opposite dependence.

Figure 2.—

Typical slice of parameter space, showing the expected time to cross the fitness valley, τ, as a function of the recombination rate r and the selective disadvantage of the single mutants δ. The other parameters are held constant at N = 105, μ = 5 × 10−7, and s = 0.05. Color indicates the effect of recombination on the expected time: recombination significantly reduces τ in the green region [i.e., log(τ(r)/τr=0) < 0], significantly increases τ in the red region [log(τ(r)/τr=0) > 0], and has only a minor effect on τ elsewhere. We see that small recombination rates r < s can reduce τ, with this effect strongest at intermediate δ, while frequent recombination rs can drastically increase τ for large δ. The rapid transition between these regimes occurs at rs. Note that increasing δ always increases τ, but can increase or decrease the ratio τ/τr=0. Also note that recombination tends to reduce τ for small values of δ and increase τ for large values of δ, increasing the dependence of τ on δ.

For rs, recombination is rare enough that it has only a small effect on the increase in double mutants once they are present in the population. If r is not too small, however, it can effectively bring together A and B alleles from single mutants and reduce the valley-crossing time. In the regime where this is true, τ decreases roughly as a power of r and is minimized at an intermediate recombination rate rs, just below the value at which recombination starts effectively breaking up double mutants (Figures 3 and 4). Interestingly, the factor by which recombination reduces τ, τr=0/τ(r), is also maximized at intermediate values of the other parameters, N, μA, μB, δA, and δB. In this region of parameter space, recombination can reduce τ by up to several orders of magnitude—a large effect, although small compared to the delay in valley crossing that can be caused by too frequent recombination rs: see Figure 5. While frequent recombination can potentially slow down valley crossing far more than occasional recombination can speed it up, recombination may still generally increase the frequency of valley crossing in populations, because the deep valleys for which recombination has a negative effect are already far less likely to be crossed, even by asexual populations, than the shallow valleys where it has a positive effect (Figure 2).

Figure 3.—

Analytical approximation and simulation results for τ, the expected time in generations to cross the valley, as a function of the recombination rate r, for N = 105, μ = 5 × 10−7, δ = 10−5, and s = 0.05, so that single mutants are effectively neutral. τ decreases with increasing r until it reaches a minimum at rs/2, after which it quickly increases back to roughly its asexual value. The overall effect of recombination is weak in this regime, with τ varying only by a factor of ≈2 = 𝒪((N2μs)1/6) over seven orders of magnitude of recombination rates.

Figure 4.—

Analytical approximation and simulation results for τ, the expected time in generations to cross the valley, as a function of the recombination rate r, for N = 108, μ = 2 × 10−9, δ = 5 × 10−5, and s = 0.05, so that selection against single mutants is strong enough to affect the dynamics, but not so strong to prevent the two single-mutant genotypes from occasionally arising together and recombining to produce double mutants. τ decreases with increasing r until it reaches a minimum at rs/2 ∼Formula times lower than its asexual value, after which it quickly increases to a maximum ∼Formula times higher than its asexual value.

Figure 5.—

Analytical approximations and simulation results for τ, the expected time in generations to cross the valley, as a function of the population size N, for r = 0.5 (frequent recombination), r = 0.025 (occasional recombination), and r = 0 (asexual), with the other parameters held constant at μ = 5 × 10−7, δ = 10−3, and s = 0.05. Lines show analytical approximations (dashed for r = 0.5, dotted for r = 0.025, and solid for r = 0). Diamonds, crosses, and circles show simulation results for r = 0.5, r = 0.025, and r = 0, respectively. Note that when τ depends on the recombination rate, it is always lowest for occasional recombination and highest for frequent recombination, with the value for asexuality lying in between.

While most of our analysis focuses on typical valley-crossing dynamics, we also briefly consider the probability that a population crosses a fitness valley unusually rapidly—a problem that is likely to be relevant to subdivided populations, in which the first subpopulation to cross the valley may then colonize the rest of the population. We find that the most likely dynamics by which populations cross fitness valleys change significantly as a function of time, with several different dynamics dominating at successively later times before the long-term dynamics dominate. The naive inference from the long-term rate of valley crossing can dramatically overestimate the early-time probability. However, even the early-time probability is typically maximized at an intermediate rate of recombination rs.

Quantitative summary:

Here we provide a list of some of the expressions for τ derived below and in appendix b for readers who wish to skip the calculations but want a more quantitative summary than the one given above. We omit all numerical factors and leave most of the interpretation of the results for other sections of this article. For simplicity, we assume here and in the heuristic arguments below that the two loci are equivalent; i.e., δA = δB ≡ δ, μA = μB ≡ μ (see appendix c for the effect of relaxing this assumption).

First, in the asexual case r = 0 it has been shown that the rate of valley crossing, τ−1, is given approximately byMath(1)Equation 1 is invalid for very small and very large N; see Weissman et al. (2009) for a detailed discussion.

For frequent recombination, rs, and single mutants no more than mildly deleterious, Math, τ is given approximately byMath(2)where τr=0 is given by Equation 1. (See appendix b for the derivation of Equation 2.) For substantially deleterious single mutants, Math, τ is given approximately byMath(3)(See the heuristic analysis below and Barton and Rouhani 1987.)

For infrequent recombination, rs, and Nμ ≪ 1 (single-mutant lineages arising infrequently), the rate of valley crossing is given approximately byMath(4)(See the heuristic analysis below and appendix b.) Equation 4 remains valid for intermediate recombination rates rs if we replace s with Math. Note that τ generally decreases with increasing Math, meaning that a recombination rate rs/2 maximizes the rate of valley crossing.

For rs and Nμ ≫ 1 (single-mutant lineages arising frequently), τ is given by Equation B27 in appendix b.

The boundaries of the parameter regimes in Equations 14 are somewhat complicated; see Figure B1 in appendix b for an illustration of a typical slice of parameter space. Note that when r ≫ s and single mutants are substantially deleterious, τ grows exponentially with increasing N—in this parameter regime, the population relies on drift to cross the fitness valley. However, in all other parameter regimes, τ decreases with increasing N.

We now turn to the derivation and explanation of these results.


For much of parameter space, the valley-crossing dynamics can be analyzed at least semideterministically, with the numbers of single mutants nAb and naB approximated by their expected values and only the double mutants treated stochastically (see, e.g., Christiansen et al. 1998). However, in other parameter regimes, the stochastic fluctuations in the numbers of single mutants are crucial to the dynamics; these include the parameter regimes where recombination has the largest effect. Here we give a heuristic analysis of the effect of recombination on the time for the population to cross the fitness valley in two of these regimes where the semideterministic approximation fails. First, we consider the regime where recombination slows down valley crossing the most: rs and δ2/s ≫ max{μ, 1/N}, where the mutants must drift far above mutation–selection balance before they begin to be favored by selection. Second, we consider the regime rs with Nμ ≪ 1, where recombination can speed up valley crossing. Together, these two regimes display all the interesting qualitative behavior of the system.

Our goal in this section is to provide an intuitive understanding of the dynamics underlying the results summarized above, focusing only on the most important features and the regimes in which stochastic fluctuations in the number of single mutants are important. More careful derivations are unfortunately quite long and technical and are reserved for appendix b. While we explicitly analyze only a limited subsection of parameter space in this section, this should provide the tools for similar heuristic analyses of the rest of parameter space. Indeed, we hope that these kinds of heuristic arguments can be used broadly for understanding evolutionary processes that depend on rare mutant lineages.

Frequent recombination, rs:

First we consider a high rate of recombination, rs, δ. In this case, recombination is frequent enough to keep the population close to linkage equilibrium, meaning that the double mutants will not begin to spread deterministically in the population until the mean fitness of all mutants is significantly greater than that of the wild type. When mutants are rare, this condition on the mean fitness of the mutants can be expressed in terms of the allele frequencies xA, xB asMath(5)The left-hand side of this inequality is the mean selective advantage of a mutant individual over wild type assuming linkage equilibrium, while the right is the strength of drift; when the inequality is satisfied, selection is likely to overcome drift and drive the mutant alleles to fixation.

For the double mutants to spread deterministically while the single mutants are still at low frequencies, the condition (5) must have a solution with xA, xB ≪ 1. This is possible only if δ ≫ s; we focus on this case here. When selection against single mutants is sufficiently strong compared to mutation and drift (specifically, when δ2/s ≫ max{μ, 1/N}—see appendix b for the derivation of this condition), the rate at which the single mutants reach frequencies satisfying the condition (5) is strongly reduced. In the limit of infinite population size, this rate goes to zero: there are two deterministic equilibria, one with ab as the dominant genotype and the other with AB as the dominant genotype, and the population will remain at the first equilibrium (Karlin and McGregor 1971). The boundary between the two basins of attraction is given by the curve for which the left-hand side of this inequality (5) is zero, with an unstable fixed point at xA = xB ≈ δ/s.

If the population is large (N ≫ δ2/s) but finite, the analysis of Barton and Rouhani (1987) can be applied to find the rate of valley crossing. Because this analysis is quite general and involved, we provide a brief heuristic summary before quoting the result. For finite N, there will be an equilibrium probability distribution for the allele frequencies, Ψ(xA, xB). Assuming linkage equilibrium and using a diffusion approximation, Ψ(xA, xB) is given byMath(6)for 1/NxA, xB ≪ 1, with a prefactor that is changing slowly compared to the exponent near the saddle point (Kimura 1964); see Figure 6. The infinite-population deterministic equilibria correspond to peaks in Ψ, and the unstable fixed point at xA = xB ≈ δ/s corresponds to a saddle point. The lower peak at xA, xB ≈ 0, the saddle point, and the higher peak at xA, xB ≈ 1 are joined by a ridge in Ψ running along xA = xB. For δ ≪ s, the most likely way for the population to cross from the lower peak to the higher peak is to drift along the ridge to the saddle, after which selection will drive the mutant alleles to fixation. Since Ψ falls off rapidly away from the ridge, we can treat the population as being effectively one-dimensional, with a single two-allele locus subject to disruptive frequency-dependent selection. The largest factor in τ comes from the tiny probability of reaching the saddle point from the initial peak. Since the probability of being near the saddle point is lower than the probability of being near the wild-type-dominated peak by a factor of ∼exp(−Nδ2/s), the time to cross the valley is exponentially large,Math(7)with additional smaller terms that depend on the mutation rate, the rate of drift, higher-order terms in the location of the saddle point, and linkage disequilibrium. Using the calculations leading to Equation 17 in Barton and Rouhani (1987), we can find the more accurate expressionMath(8)where Γ is the gamma function. For large populations, Equation 7 is an enormously long time; they essentially cannot cross the valley.

Figure 6.—

Schematic contour plot of equilibrium allele frequency probability distribution for a large population with deleterious single mutants and frequent recombination (Equation 6). The red curve shows the most likely path through frequency space from the initial wild-type-dominated state to the final mutant-dominated state. This path passes through the saddle point, shown in blue, separating the two deterministic equilibria. The probability of being near the saddle point at xA ≈ δB/s, xB ≈ δA/s is lower than the probability of being near the initial peak by a factor ∼exp(−NδAδB/s), producing a factor in the expected time τ to cross the valley of ∼exp(NδAδB/s).

Equations 7 and 8 are valid for Math. The case of infrequent recombination, in which the first inequality is violated, is discussed below; the case of shallow valleys, in which the third inequality is violated, is discussed in appendix b. As mentioned above, for deeper valleys, δ ≫ s (but r ≫ δ), the condition (5) is satisfied only for xA, xB close to 1—before the single mutants can begin to increase deterministically in frequency, at least one must drift close to fixation. The expected time for this to happen is very long, with log τ ≈ Nδ (Kimura 1962). Note that this matches smoothly with Equation 7 at the boundary of the two approximations, δ ≈ s. Finally, for very deep valleys, δ ≫ r, selection against single mutants is strong enough to keep the population in linkage disequilibrium. In this case, we expect that the most likely way for the population to cross the valley is for an AB lineage to drift close to fixation while undergoing unusually little recombination. The expected time for this to happen is extremely long, with log τ ≈ Nr.

Infrequent recombination, rs:

Now we consider infrequent recombination, rs. We focus on calculating the rate of valley crossing in populations large enough that they are likely to cross via “stochastic tunneling” (Komarova et al. 2003), in which double mutants take over the population without single mutants ever reaching high frequencies. For a double-mutant lineage to take over the population, it must avoid going extinct due to drift or being broken up by recombination. This occurs with probability Math (Hadany 2003). We refer to a mutant individual as successful if its descendants include double mutants that eventually take over the population; we wish to find the waiting time for the first such successful single mutant.

If the population is large enough that new mutants are continually being produced, Nμ ≫ 1, the dynamics of the single mutants can be treated approximately deterministically; this case is discussed in appendix b. Here we focus on low population mutation rates, Nμ ≪ 1. In this case, the total waiting time 𝒯 is generally dominated by the waiting time for the production of the first successful single mutant, with the first successful double mutant being produced shortly afterward. Since single-mutant lineages are produced at rate 2Nμ, 𝒯 will have an approximately exponential distribution with expected time τ ≈ (2Nμp)−1, where p is the probability that a single-mutant lineage will be successful (Komarova 2006; Weissman et al. 2009). To find τ, it thus suffices to find p.

Lineage dynamics:

A single-mutant lineage must be lucky to be successful, but this luck can come in several different places: for instance, the lineage could luckily drift to such an unusually large size that it becomes likely to produce so many double mutants that one will be successful, or it could drift to only a small size and produce one double mutant that luckily is successful. To distinguish between these cases, we can decompose p into the probability prob(T) that a lineage drifts for T generations before going extinct and the probability P | T that it will be successful given that it does so, integrated over all possible drift times T:Math(9)

To evaluate Equation 9, we need to understand the typical dynamics of a single-mutant lineage. Here we summarize the aspects of the dynamics that are important for calculating p; slightly more detailed expressions are given in appendix a. Let the number of individuals in the lineage at time t be given by n(t), with n(0) = 1. For t ≪ min{N, 1/δ}, the lineage is effectively neutral and unlikely to reach a high frequency. For the rest of this section, we assume that N is sufficiently large that all single mutants remain at low frequency. In this case, for t ≪ 1/δ (and ≫1), the probability that the lineage has avoided extinction by time t is approximately proportional to t−1, with the constant of proportionality depending on the strength of drift. This can be seen by modeling the dynamics of the lineage by a nearly critical branching process (as in appendix a for continuous time or in Athreya and Ney 1972, pp. 19ff, for discrete time) or by using a diffusion approximation (Ewens 2004, p. 162). We use the symbol “∼” to denote approximate proportionality up to a model-dependent constant. (In appendix b, we use a continuous-time Moran model chosen such that this constant is 1 for most expressions of interest.)

Since the expected number of individuals in the lineage E[n(t)] = exp(–δt) remains ≈1 until t = 𝒪(1/δ), the typical size of the lucky lineages that avoid extinction for a time t ≫ 1 is Math. At longer times, t ≳ 1/δ, the probability that the lineage has avoided extinction decreases like exp(–δt), as does E[n(t)]: selection makes it very unlikely that lineages will persist for much longer than ∼1/δ generations, and the few that do last for a long time typically do not grow past a characteristic size of n ∼ 1/δ. Thus, the distribution of lineage drift times T (and sizes n) has a long tail until it is cut off by selection at T, n ∼ 1/δ. This suggests for small δ, rare large lineages may play an important role in the population dynamics. Note that these basic features of the lineage dynamics are the most important for determining the rate of valley crossing; we expect our results to be insensitive to changes in the model details.

While T is broadly distributed over t < 1/δ, the distributions of the other random variables that play a role in valley-crossing dynamics are for the most part relatively narrow. In particular, besides the distribution of T for t > 1/δ, the distributions of n(t), the number of mutants produced by a lineage with a given trajectory, the number of mutants produced in a given time interval, and the number of recombinations between two lineages with given trajectories all have exponential tails. Thus, as a good first approximation, we can generally treat the drift times of the mutant lineages as the only random variables, with all other variables having fixed values given the drift times, and treat even the drift times as being limited to values <1/δ. The exponentially unlikely valley-crossing dynamics that violate this approximation produce only small rates of valley crossing, as seen in the case of frequent recombination and deep valleys discussed above.

Given these lineage dynamics, we can now find an approximate expression for P | T, the probability that a lineage that lasts for T generations will be successful. We can write P | T as the sum of two terms: Pas | T, the probability of success via the asexual path, in which a single-mutant individual acquires an additional mutation, and Pr | T, the probability of success via the sexual path, in which an Ab individual recombines with an aB individual. In the next section, we derive approximate expressions for Pas | T and Pr | T, assuming for simplicity that single mutants are neutral, δ = 0. (We discuss the effect of larger δ later.) Recall that in this case we expect rare large lineages to be important.

Neutral single mutants—asexual path across the valley:

First consider the probability Pas|T of a single-mutant lineage being successful via the asexual path given that it drifts for at least T generations. For a detailed discussion of this case, see Weissman et al. (2009); here we provide just a brief summary to introduce the heuristic arguments that we use in the more complicated sexual path below. To be successful, the single-mutant lineage must produce a successful double mutant. Each double mutant has a probability Math of being successful, so if the single-mutant lineage gives rise to n2 double mutants, it will be successful with probability Math, until this saturates at 1 for Math—lineages that give rise to Math double mutants are likely to be successful.

Thus, to find Pas|T, we need to find the typical number n2 of double mutants that the lineage will produce. Each individual in the lineage has a probability μ per generation of producing a double mutant, so the total number produced over T generations will typically be Math. To evaluate this integral, we must use the fact, mentioned above, that the size of the lineage will typically grow like n(t) ∼ t; doing so, we find that the typical number of double mutants produced in T generations is n2T2μ. Multiplying by Math to get the total probability of success gives an approximate expression for Pas|T:Math(10)

Since Pas|T grows like T2 and the probability of lasting for T generations only falls off like T−1, the asexual valley-crossing dynamics are dominated by the rare lineages that grow to a large size over a long time; the vast majority of lineages that reach only much smaller sizes make little contribution to the total rate of valley crossing. In other words, the luck needed to cross the valley is likely to be concentrated in a single very large lineage rather than spread out over multiple moderately sized lineages. (When single mutants are strongly deleterious, the opposite is true; see Weissman et al. 2009.)

Neutral single-mutants—sexual path across the valley:

We now turn to finding Pr | T, the probability that a lineage that drifts for at least T generations is successful via the sexual path. To do so, we must consider the lineages with complementary genotypes that will coexist with the focal lineage. During the T generations for which the focal lineage (of genotype, say, Ab) drifts, ≈NμT aB lineages typically arise. The probability of being successful therefore grows like Math, where Math denotes the probability that a single aB lineage will be successful by recombining with the focal lineage.

As in Equation 9, we can decompose the probability Math that an aB lineage will be successful into the probability that it avoids extinction for T′ generations and the probability Pr | T,T that it is successful given that it does so: Math. The total probability that the focal Ab lineage will be successful therefore grows likeMath(11)until saturating at 1 for sufficiently large T.

We already know the distribution of the lineage sizes T′, so to evaluate Equation 11 it remains to find Pr|T,T. The number of AB individuals produced by the pair of the lineages via recombination is typically Math. We know that nAbT and naBT′; assuming T′ ≤ T, the two lineages coexist for T′ generations. (It is easy to check that T′ > T makes only a small contribution to Equation 11.) Evaluating the integral, we find that the two lineages typically recombine to produce Math AB individuals. Thus Pr | T,T grows like Math until it approaches 1 at Math. Using this to evaluate Equation 11, we find that the probability of success isMath(12)

For small Ab lineages (Math), the most likely path to being successful requires an aB lineage to arise at roughly the same time and grow to roughly the same size. Larger Ab lineages, however, are most likely to be successful by recombining with an aB lineage much smaller than themselves, because Pr|T,T saturates at 1 for T′ ≪ T. Because Pr|T grows faster than linearly with T before saturating, the sexual valley-crossing dynamics, like the asexual dynamics, are dominated by the rare single-mutant lineages that grow so large that they are then likely to be successful [Math]. Note that this means that in the most likely sexual valley-crossing path, the two single-mutant lineages are likely to be of very different sizes, even when the two loci are symmetric. (See Figure 7 for a typical valley-crossing trajectory.)

Figure 7.—

Typical simulated population dynamics in the “stochastic tunneling” regime, with neutral single mutants, Nμ ≪ 1, and Formula. Light shading shows Formula, dark shading shows Formula, and solid shading shows Formula. [Square roots are shown to make low-frequency lineages more visible and because this is the natural scale for measuring the variance of the frequency of rare mutants (Cavalli-Sforza and Edwards, 1967).] The inset is a magnified view of the last few thousand generations before AB takes over the population, starting from the birth of the first successful single mutant. We see that the time to cross the valley is dominated by the waiting time for an unusually large single-mutant lineage that drifts to size nAb over ∼nAb generations. While this lineage drifts, many small aB and AB lineages arise and go extinct. The largest aB lineage drifts to a size naBNμnAb over ∼naB generations, recombining many times with the Ab lineage and producing the successful AB lineage.

Neutral single mutants—advantage of recombination:

We can now compare Equations 10 and 12 to find whether the population is more likely to cross the valley via the asexual or the sexual path. Since both the asexual and the sexual valley-crossing dynamics are dominated by rare large single-mutant lineages, this is equivalent to asking whether the size at which a lineage is likely to be successful is smaller for the asexual or the sexual path. Success via an additional mutation becomes likely at Math, while success via recombination becomes likely at Math. Therefore, the asexual path dominates for Math (i.e., Math), while for Math, the sexual path dominates. In these two extreme cases, Equation 9 for the total probability p that a lineage is successful simplifies toMath(13)The characteristic time τ for valley crossing is therefore given byMath(14)corresponding to the first two lines of Equation 4.

Comparing the two expressions for τ in Equation 14, we see that an intermediate level of recombination, Math, reduces the time to cross the valley by a factor of ∼(N2μr2/s)1/6. However, because of the small exponent, and the fact that Nμ, r/s ≪ 1 by assumption, this is generally not a big effect (see Figure 3). Moderate recombination thus typically provides a small increase in the rate of crossing very shallow valleys. (This is also true for Nμ ≥ 1; see Figure 5 and appendix b.)

Deleterious single mutants:

Intuitively, small amounts of recombination reduce the time for valley crossing by reducing the size to which a single-mutant lineage must drift to be likely to be successful. Because the lineage size distribution has a long tail for neutral single mutants, this is only a small effect. However, if single mutants are deleterious, δ > 0, then the long tail is cut off at T ∼1/δ, and recombination can have a larger effect. In particular, if Math, then selection against single mutants has little effect on the rate of valley crossing via recombination, but slows down the asexual route significantly.

For even larger values of δ, both the asexual and the sexual paths will be affected. This case can be approximately analyzed by simply recalculating the expressions for Pas | T and Pr | T from the previous section using the approximation prob(T > 1/δ) ≈ 0. The results are somewhat complicated and are given in Equation 4 and in Equations B21 and B23 in appendix b, where they are derived by a more careful calculation. The most important point to note is that for Math, selection against single mutants does not significantly affect the rate of valley crossing (i.e., τ is given by Equation 14). Thus, in large populations, many mutations that are strongly selected against (Nδ ≫ 1) may still be effectively neutral for the purposes of valley crossing.

It is also interesting to note that while small values of δ affect the asexual path across the valley more strongly than the sexual path, the opposite is true for large values of δ. This is because the sexual path requires two single-mutant lineages; if δ is large enough to affect both lineages, then selection will impede the sexual path more than the asexual path (which requires only one single-mutant lineage). In fact, for δ > s/2, Pr | T becomes negligible compared to Pas|T, and any amount of recombination slows down valley crossing by reducing Math. At the intermediate values of δ where recombination provides the greatest benefit, it reduces τ by a factor of Math: for rs/2, this factor is Math, meaning that recombination can increase the rate of valley crossing by as much as several orders of magnitude (see Figures 2 and 4).

Distribution of valley-crossing times:

We have focused on the expectation τ of the time for the population to cross the valley. The full distribution of the valley-crossing time 𝒯 is close to exponential in the regimes where it is typically dominated by the waiting time for the first successful single mutant (because mutations are rare, Nμ ≫ 1, or because each mutant is very unlikely to be successful, sr, Nδ2), so it is given byMath(15)This is valid, however, only for t large compared to the typical time that a successful single-mutant lineage drifts before the double mutants take over. This drift time makes only a small contribution to τ, but it reduces the probability that 𝒯 will be very small: even if the first successful single mutant is produced extremely early, it will still usually take time to generate the first successful double mutant and for the double mutants to spread. It is worth understanding this effect, because the left tail of the τ-distribution can be important for some processes. For example, in a metapopulation made up of many demes (say, for a pathogen or attenuated vaccine strain colonizing a host population), it may be that the lucky deme that crosses the valley first will colonize the other demes; the typical time for this to occur depends on the left tail of P(𝒯) rather than τ. The asexual case r = 0 is also relevant for understanding the incidence of cancers that typically occur only after a series of mutations (see popular accounts by Wodarz and Komarova 2005, Nowak 2006, and Frank 2007): the rare unlucky individuals that develop any given form of cancer are likely to have had cells that acquired multiple mutations anomalously fast.

As an example, we now consider the case of infrequent recombination (rs) and rare mutation (Nμ ≪ 1) described above. We saw that the successful single-mutant lineage typically drifted for a time T ∼ (Nμ2rs)−1/3, so Equation 15 is invalid for t < (Nμ2rs)−1/3. We can, however, perform a similar heuristic calculation to derive P(𝒯) in this regime. We focus on calculating the probability that the successful double-mutant lineage has been produced by time t. We expect that this probability will be dominated by the probability that a lineage will be produced very early (with probability ∼Nμ) and will avoid extinction for Tt generations and grow to a size nt, which occurs with probability ∼t−1. The lineage will then be successful with probability P|tPas|t + Pr|t (where Pas|t and Pr|t are given by Equations 10 and 12 with T = t), so the rate of valley crossing at time t will be ∼Nμt−1P|t, and the cumulative probability of having produced a successful double mutant by time t will grow like Math. (Here we are ignoring numerical factors of order 1, as we will throughout the rest of this section.)

Plugging in the values from Equations 10 and 12 to get P|t, we can find the complete 𝒯 distribution. In the regime Math where the sexual path dominates at long times, we haveMath(16)(see Figure 8). We see that for very short times, the population is more likely to cross the valley by the asexual route, with the sexual route becoming more likely at a time t = 𝒪(1/r) when it becomes likely that individuals will have undergone recombination.

Figure 8.—

Analytical approximations and simulation results for the full distribution of 𝒯, the time in generations to cross the valley, with parameters N = 105, μ = 5 × 10−7, r = 0.01, δ = 10−5, and s = 0.05. The dotted curve shows the exponential approximation Equation 15, the solid curve shows the more accurate approximation Equation 16 (with numerical factors included), and the circles show the results of simulations. For these parameters, the expected time to cross the valley is τ ≈ 4 × 104 generations. We see that for times short compared to τ, Equation 15 overestimates the probability of having crossed the valley, while Equation 16 is accurate for all times.

Note that it is straightforward to use this approach in combination with the analysis of Weissman et al. (2009) to find the complete 𝒯 distribution for an asexual population crossing a fitness valley of arbitrary width K. Doing so, one finds that for shallow valleys P(𝒯 < t) generally grows like tK for small t and then like tK−1, tK−2, …, t before exponentially approaching 1. This suggests that we should expect cancer incidence curves to generically show multiple different power-law behaviors as a function of age. Note the contrast between this result and the prediction of, e.g., Frank (2007) or Schweinsberg (2008) that the incidence should grow like a single power of t at early times. (This result also differs from those of Schweinsberg 2008 in that we find the same early time dependence for populations that have different long-time exponential behavior.) Our results show that the use of time-dependent incidence data to infer the number of mutations responsible for a cancer is seriously problematic. This is especially so in the absence of much information about relevant mutation rates, numbers of potentially oncogenic mutations, and fitness effects of intermediaries.


In addition to the analytical work described above, we also performed stochastic simulations, similar to those used in Weissman et al. (2009) (see appendix d for simulation methods). The results of these simulations are generally in good agreement with our analytical results predictions, although there are significant differences in the intermediate parameter regimes, particularly for r just slightly larger than s, where the approximations made in the analysis are invalid; see Figures 3, 4, and 5.

In Figures 3 and 4, we show our analytical predictions and simulation results for the expected valley-crossing time τ as a function of the recombination rate r, with the lowest recombination rate r = 10−6 corresponding to, e.g., a distance of ∼100 bp in Drosophila (Hilliker et al. 1994; Andolfatto and Wall 2003) and the highest recombination rate r = 0.5 corresponding to unlinked genes. In Figure 3, the other parameters are chosen such that Math, so that single mutants are effectively neutral (even though they would normally be described as being weakly deleterious) and we are in the stochastic tunneling regime, described above for rs. We see that a small amount of recombination indeed reduces τ compared to its asexual value, but only slightly: at the optimal recombination rate, rs/2, τ is reduced only by a factor of ≈(N2μs/2)1/6 ≈ 2. We also see that at large values of the recombination, τ returns to approximately its asexual value.

In Figure 4, the fixed parameters are chosen such that Math and Nμ < 1, so that for r < s we are in the stochastic tunneling regime with deleterious single mutants, and for rs we are in the exponentially suppressed regime discussed above. We see that recombination has a much stronger effect on the valley-crossing time for deleterious single mutants than it does when single mutants are neutral, with τ reduced by a factor of Math from its asexual value at the optimum recombination rate and increased by a factor of Math at the maximum recombination rate. Because the population parameters are chosen to be closer to the boundaries of our analytical approximations, the agreement between the analysis and the simulations is not as close in Figure 4 as it is in Figure 3, although it is still quite good.

In Figure 5 we show our analytical predictions and simulation results for τ as a function of the population size N. For small population sizes, the two loci are essentially never simultaneously polymorphic, so recombination has no effect. In this small-population regime, the most likely way for the population to cross the valley is for one single mutant to drift to fixation, so τ increases as population size increases and selection against single mutants becomes more effective. At intermediate population sizes, the single mutants can produce a successful double-mutant lineage while still at low frequency in the population, and valley crossing becomes easier, with τ decreasing as N increases, and mutants are produced more frequently. A small amount of recombination increases the rate at which double mutants are produced, decreasing τ. Frequent recombination both increases the rate of production of double mutants and reduces the probability that a double-mutant lineage will be successful; these two effects cancel and τ is approximately the same as for an asexual population. At larger population sizes, τ continues to decrease with N when recombination is rare or absent, but rapidly increases with N for frequent recombination. At very large population sizes, double mutants are essentially produced instantly, and the population crosses the valley instantly as long as recombination is not too frequent. The agreement between the analytical predictions and the simulation results is very close for most of parameter space. The biggest differences occur at the boundaries of the analytical approximations, particularly at Nμ ≈ 1 for occasional recombination and Ns2 for frequent recombination.


We have provided a complete and intuitive description of the rate at which populations acquire adaptations requiring two mutations at different loci; these span a very broad range of parameter space and multiple regimes. Several subregimes of parameter space have previously been examined in detail. In particular, the case of strongly deleterious single mutants has been described for rs by Michalakis and Slatkin (1996) and Hadany (2003) and for rs as a special case by the results of Barton and Rouhani (1987), while for the case of neutral single mutants and rs, Nμ ≳ 1 is examined in great detail in Christiansen et al. (1998). That rs is the critical value of the recombination rate, above which recombination effectively breaks up double mutants, was described by Crow and Kimura (1965) and further investigated by Eshel and Feldman (1970), Karlin and McGregor (1971), Weinreich and Chao (2005), and Jain (2009), among others. The present work serves to tie together these previous results, providing the boundaries of the regions of parameter space that they describe. In addition, we explore new regions of parameter space, including the case of mildly deleterious single mutants, the region in which recombination can speed up valley crossing the most. We also discuss the early-time dynamics most relevant for understanding cancer rates and many aspects of pathogen evolution; these have not previously been analyzed at any depth for the most part, and our results contradict previous findings (Schweinsberg 2008) in the one part of parameter space to have been considered in detail.

As mentioned earlier, Lynch (2010) recently addressed some of the same questions. Here we compare his results to ours for the regions of parameter space where the two overlap. For the interplay between the asexual and sexual processes, Lynch's (2010) Equation 1 for τ is incorrect. It relies on the assumption that for each single-mutant lineage, success via the asexual path occurs independently from success via the sexual path. But it follows from our heuristic arguments that the converse is true, since for neutral single mutants both paths depend primarily on the same random variable T, the time that the lineage drifts (see our Equations 9, 10, and 12). (And since drifting to fixation is also primarily dependent on T, earlier asexual versions of this equation such as Equation 3b in Lynch and Abegg 2010 are also incorrect.) For neutral single mutants and rs, Equation 2 in Lynch (2010) is similar to the second line of our Equation 13 and is found using a method that appears to be similar to our heuristic argument. However, the approximations given for the two parts of the process are valid in opposite limits and thus inconsistent: an anomalously long-lived first mutant dominates when Nμ ≪ 1, but the second mutant can be treated only deterministically, as assumed, for Nμ ≫ 1. For rs and large population sizes, Nμ ≫ 1, Lynch (2010) claims τ ≫ τr=0, in contrast with our result τ ≈ τr=0, although it is not clear in this case how he has defined τ or the boundaries of the regime. For deleterious single mutants and rs, Equation 5 in Lynch (2010) is clearly incorrect, since it adds Equation 4b, which is valid only for strongly deleterious single mutants, and Equation 4a, which overcounts the asexual path across the valley that is already included in Equation 4b. The (fairly poor) agreement with simulations is obtained only by the use of a fitting parameter with no biological meaning. For rs, the regime where recombination slows down valley crossing the most, Lynch (2010) gives an expression for τ (Equations 6 and 7) that differs somewhat from our Equation 8 because it is taken from results for a different kind of fitness valley from the one that both he and we are considering.

It was previously established that large asexual populations can rapidly cross shallow valleys, suggesting that complex adaptation may be common in such populations (see, e.g., Weissman et al. 2009 and Lynch and Abegg 2010). Here we find that crossing shallow valleys is made even easier by moderate amounts of recombination (and is not affected by large amounts of recombination), suggesting that this kind of complex adaptation may be even more common in large sexual populations, especially among different sites within single genes for obligately sexual populations. On the other hand, we have found that sexual populations are less likely than asexual populations to cross deep valleys. Thus, recombination increases the existing bias in favor of crossing shallow rather than deep valleys toward large rather than small peaks.

Previous studies such as Stephan and Kirby (1993) have found that the rate of compensatory double substitutions at distant loci is suppressed in natural populations, evidence of the decrease in the rate of valley crossing for rs predicted by our work as well as many previous results. Our results further suggest that for very closely linked loci, adaptive double substitutions should actually increase as a function of the genetic distance between the loci. However, this signal is likely to be harder to detect in data from natural populations, because it is difficult to identify adaptive double substitutions. (Piskol and Stephan 2008 do observe that the rate of double substitutions at a certain set of interacting sites does increase with genetic distance at short distances, but this is unlikely to be due to recombination, as the double substitutions in question are expected to be only compensatory, not adaptive.)

This article has focused on the specific case of a population with a single possible adaptation, requiring two specific individually neutral or deleterious mutations. Real populations, however, are likely to have many possible adaptations, some requiring only a single mutation and others requiring many, with each adaptation potentially reachable by multiple different mutational paths. Understanding what determines the rate and manner of adaptation in these more realistic situations remains a very difficult problem, but the work in this article should be helpful. In particular, for populations with possible complex adaptations requiring mutations at several loci, we expect that it will still be true that a small amount of recombination among the necessary loci will make the adaptation more likely to be acquired, while a large amount will make it unlikely, with the scale separating small and large recombination rates being set by the advantage provided by the adaptation. Such an effect has indeed been observed in simulations (Suzuki 1997). More generally, we should expect to see a transition from selection acting on some combinations of mutations at tightly linked loci to selection acting roughly independently on different haplotype blocks at longer genetic distances. Neher and Shraiman (2009) find such effects in simulations and analyses of evolution on certain types of (random) fitness landscapes. Because the genetic length scale at which this transition occurs depends on the strength of selection, we expect that complex adaptations that provide large selective advantages may involve longer regions of chromosomes than adaptations providing smaller advantages.


When mutants are sufficiently rare in the population that they do not influence each other, the dynamics of a mutant lineage can be described by a continuous-time branching process in which individuals reproduce via binary fission at rate 1 − δ (with δ ≪ 1) and die at rate 1. Here we list several basic facts about the dynamics of such lineages that serve as the basis for the analysis in the rest of the article. Let nAb(t) be the number of individuals at time t, with nAb(0) = 1, and let T be the extinction time, T ≡ inf{t: nAb(t) = 0}. The probability that the lineage persists for at least t generations isMath(A1)Math(A2)(Kendall 1948), and given that it does so, it has size nAb(t) = n with probabilityMath(A3)

Note that the distribution of extinction times has a long tail before falling off exponentially for t ≳ 1/δ. Note also that the number of individuals at a given time has an approximately geometric distribution, falling off exponentially for n ≳ min{t, 1/δ}. The exact expressions above are also valid for δ < 0, corresponding to a lineage with a selective advantage. Thus we can see from Equation A1 that (ignoring recombination) an AB lineage has a probability ≈s of succeeding and that those AB lineages that do go extinct typically do so in t ≲ 1/s generations.

If we instead start with nAb(t = 0) = n0 individuals, then the expected number of individuals at time t will beMathand the variance will beMath

Note that the number of individuals is likely to remain nAb(t) ∼ n0 until t ≈ min {n0, 1/δ} and that the lineage is likely to be much smaller or extinct for t > 1/δ, regardless of n0.


Why generating functions:

We want to estimate the rate of stochastic tunneling, in which all mutants are rare until the double mutants begin to take over the population. Let pt(nAb, naB, nAB) be the probability that at time t there are nAb Ab individuals, naB aB individuals, and nAB AB individuals in the population. It is useful to consider the generating function 𝒵 of pt defined byMath(B1)

In particular, to calculate the probability P(t) that the double mutants dominate the population at time t (i.e., that the population has crossed the valley by this time), it is useful to consider the generating function for the distribution of nAB, 𝒵(0, 0, h, t) = E[exp(−hnAB)]. In typical valley-crossing dynamics, nAB will remain small for a time, until it reaches a threshold number at which selection begins to effectively favor the double mutants, after which they quickly sweep to near fixation. In large populations, this threshold number will be ≪N, so we can choose h ≫ 1/N such that nAB will typically remain ≪1/h until the double mutants take over the population. For such h, the random variable exp(–hnAB) remains ≈1 until the population crosses the valley, at which point it rapidly drops to ≈0 (because nAB approaches N ≫ 1/h). Thus exp(–hnAB) is an approximate indicator variable. Therefore, its expected value 𝒵(0, 0, h, t) approximately gives the probability of not having crossed the valley by time t, and we can writeMath(B2)We see that if we find 𝒵, we will have found the distribution of valley-crossing times 𝒯.

Partial differential equation for 𝒵:

To calculate 𝒵, we use a continuous-time Moran model for the population. All individuals die at an equal rate, which sets the unit of time, and are chosen to replace dead individuals with a probability proportional to their fitness. Mutation and recombination occur independently from each other and from reproduction. (It is trivial to adjust the parameters to allow for mutations or recombinations occurring at birth.) When mutants are rare, so that mutants essentially do not compete with each other, pt evolves according to the approximate differential equationMath(B3)

It is straightforward to use Equations B1 and B3 to write down a partial differential equation for 𝒵. For ϕAb, ϕaB, ϕAB ≪ 1, this is approximatelyMath(B4)Note that if it were not for the recombination between Ab and aB individuals, which gives rise to the second-order term Math, Equation B4 would describe the evolution of the generating function of a familiar multitype branching process. For such processes, the problem of finding the probability of fixation of mutant alleles has been extensively studied (see, e.g., Barton 1995 and the references therein). Unfortunately, the addition of recombination makes our problem significantly more complicated; Equation B4 cannot be solved exactly.

Small populations (Nμ ≪ 1):

To simplify the problem, we must make some approximations. For Nμ ≫ 1, nAb and naB are usually well approximated by their expected values (Fisher 2007), greatly simplifying the analysis; see below for a discussion of this case. Here we focus on Nμ ≪ 1. In this case, single-mutant lineages arise infrequently, so for an Ab lineage and an aB lineage to coexist and recombine to produce an AB lineage, one of two rare events must occur: either the first lineage to arise must persist for an unusually long time or the two lineages must arise unusually close together in time. Recalling the heuristic description of lineage dynamics given in the main text (with the distribution of lineage lifetimes T having a long tail like ∼1/t until it begins to fall off exponentially at t ∼1/δ), we expect that the former possibility will be more common for sufficiently small δ (neutral or mildly deleterious single mutants) while the latter will be relatively more common for larger δ (strongly deleterious single mutants).

We focus on the first possibility, that the first single-mutant lineage persists for an unusually long time. In this case, there is an asymmetry between the two successful single-mutant lineages, since one arises well before the other. Here we find pAb, the probability that an Ab lineage will be successful by drifting for a long time and then producing a successful AB individual either directly by mutation or by recombining with a later aB lineage. paB, the probability that an aB lineage will be successful by drifting for a long time, will be the same (with δA ↔ δB, μA ↔ μB), and when this is the dominant mode of valley crossing, P(t) at long times will be given byMath(B5)

Probability that an Ab lineage will be successful:

Now we turn to finding pAb. By assumption, the Ab lineage arises and drifts for many generations without interacting with aB individuals, so its dynamics can be modeled by a simple single-type branching process. The approximate dynamics of such a process are discussed in the main text, while the exact distributions of the extinction time and the number of individuals are given in appendix a. Because the Ab lineage is likely to be large by the time the aB lineage arises (at least for δA not too large), we make the approximation that the Ab lineage's dynamics are relatively unaffected by the aB and AB lineages until these lineages become large and the mutants are beginning to sweep to fixation.

The Ab lineage will, however, affect the other mutant lineages. To find pAb, we consider the dynamics of naB(t) and nAB(t) conditioned on a particular trajectory {nAb(t)} and then take the expectation over all possible trajectories. Let ZaB, ϕAB, t | {nAb}) be the generating function of this conditioned process, defined as in Equation B1 by ZaB, ϕAB, t | {nAb}) ≡ E[exp(–ϕaBnaB – ϕABnAB) | {nAb}]. Then, as in Equation B2, the probability that the Ab lineage with trajectory {nAb} has produced a successful AB lineage t generations after it has arisen isMath(B6)and the total probability that the Ab lineage is successful isMath(B7)We can thus find pAb by finding Z.

Generating function for the aB and AB lineages:

To find Z, note that it satisfies the partial differential equationMath(B8)where xAnAb/N, with xA ≪ 1 by assumption. (Equation B8 can be derived either directly from the master equation for the conditioned process or by simply substituting Math in Equation B4.)

Equation B8 describes the evolution of a two-type branching process with time-dependent transition rates, and we can solve it using the method of characteristics (Kendall 1948). Taking the generating function variables to be functions of time, ϕi = ϕi(t), the total time derivative of Z is given byMath(B9)Note that if we choose ϕaB(t) and ϕAB(t) to cancel the second and third terms on the right-hand side of Equation B9, we are left with a simple ordinary differential equation for Z, with the solutionMath(B10)

Note that Equation B10 implies that the probability that a lineage with trajectory {nAb} is successful after t generations isMath(B11)and the total probability of success isMath(B12)From Equations 11 and 12, we see that ϕaB and ϕAB can be naturally interpreted as the conditioned probability of success for an aB lineage and an AB lineage, respectively. (The first term in the integrand corresponds to the aB lineages that arise while the Ab lineage drifts, while the second term corresponds to the AB lineages produced directly by mutation from Ab.) See Barton (1995) for similar expressions obtained in a slightly different way.

Characteristic equations for probabilities of success:

For Equation B11 to be valid, ϕaB and ϕAB must satisfy the (Riccati) differential equationsMath(B13)Math(B14)We are interested in the parameter regime where aB individuals are much more likely to produce AB individuals by recombining with Ab than by mutating (rxA ≫ μA), so we neglect the term proportional to μA in Equation B13 for the rest of this article. (Note that Equation B11 does include the possibility that the focal Ab lineage produces AB individuals via mutation.)

We can solve Equations B13 and B14 approximately for the trajectory of ϕaB and ϕAB, moving backward in time. Let u be a backward-time variable, with u = 0 corresponding to the time t at which ϕaB = 0, ϕAB = h, and u = t corresponding to the time at which the Ab lineage arises. For small u (i.e., at late times), ϕaB and ϕAB are very small and we can ignore the quadratic terms in Equations B13 and B14. The eigenvalues of these linearized equations areMath(B15)

After an initial transient of the order of Math generations, the larger eigenvalue λ+ dominates and ϕaB and ϕAB increase as Math until the quadratic terms in Equations B13 and B14 become important. Note that λ+ is a function of xA and therefore changes over time. However, we will see that the dynamics of ϕaB and ϕAB are fast compared to the changes in xA for many of the most important trajectories {nAb}. For such trajectories, ϕAb and ϕAB quickly approach the fixed point (Math) of Equations B13 and B14, which then changes slowly as xA changes.

Fixed point and heuristic interpretation:

Setting the left-hand sides of Equations B13 and B14 to 0, we find that the first coordinate Math of the constant-xA fixed point satisfiesMath(B16)Equation B16 can be solved exactly using the cubic formula, but the solution is not illuminating. Given that it is already an approximation, it is more useful to look at the simple approximate solutions in different limits, along with the corresponding approximate value of Math:Math(B17)

Recalling that ϕaB and ϕAB can be interpreted as the conditioned probabilities of success of aB and AB individuals, respectively, we can understand some aspects of Equation B17 intuitively. For rs, we have Maths (the asexual probability of success for AB), because in this case recombination is unlikely to break up the beneficial combination. For small δB, Math is the probability of drifting over Math to a size Math, at which point the lineage is likely to produce ∼1/s AB individuals via recombination and be successful. (Note that this agrees with the heuristic argument given in the main text.) For larger δB, MathrxAsB is the probability of being successful by drifting for ∼1/δB generations to a size ∼1/δB and producing Math AB individuals.

For rs we have MathMath, indicating that linkage is not important (i.e., given nAb ≫ 1, a B allele has roughly the same probably of success on an a background as on an A background). The probability of success xAs − δ is just the mean selective advantage of the B allele when rare. Note that the B allele has lower mean fitness than the b allele for xA < δ/s, so for these values of xA the mutants will not succeed and MathMath ≈ 0. (In fact, there is still a small probability ∼eNδ that a B allele drifts against selection to fixation, and a small probability ∼eN(rs) that an AB lineage avoids recombination while sweeping to fixation, but we neglect them here.)

Timescale for aB and AB dynamics:

As mentioned above, ϕaB and ϕAB will approach the values given in Equation B17 only when their dynamics are fast compared to the rate at which xA changes. When ϕaB and ϕAB are far from the fixed point, they change at a rate λ+. To find the rate of change as they approach the fixed point, we can linearize Equations B13 and B14 about the fixed point and find the eigenvalues Math of this system. The rate of approach will be set by Math (where Math < Math < 0). Skipping the straightforward but tedious calculations, we find that in the asymptotic regimes of Equation B17, the dynamics occur at rates λ+ and Math given byMath(B18)(Note that we have ignored the trivial fixed point in the last line of Equation B17, which does not contribute to the probability of success.)

As with Equation B17, we can understand some aspects of Equation B18 intuitively. From the first two lines of Equation B17, we saw that for rs we could roughly neglect the effect of recombination on the dynamics of an AB lineage once it is formed. As mentioned in appendix a, we thus expect an AB lineage to drift for ∼1/s generations before either becoming successful or going extinct; we see that this timescale corresponds exactly to the time ∼Math that it takes for ϕAB to begin to approach its fixed point. For small δB, we argued that Math corresponded to the probability of an aB lineage drifting for Math generations; we now see that this is indeed the timescale |Math|−1 in the first line of Equation B18. For larger values of δ, the dominant process is for the aB lineage to drift for ∼1/δB generations; this, together with the 1/s timescale associated with the AB lineage, sets the scale |Math|−1 in the second line of Equation B18. Finally, for rs, the B allele has mean selective advantage xAs − δ and thus typically drifts for ∼1/(xAs − δ) generations before beginning to increase deterministically or going extinct.

Recall that we wish to compare Equation B18 to the typical rates of change of xA. As discussed in appendix a, a trajectory that reaches nAb(t) = n will typically remain 𝒪(n) for at least ∼min{n, 1/δA} generations. So if the values in Equation B18 are large compared to max{1/(NxA), δA} for the trajectories making the largest contributions to pAb in Equation B12, we can make the approximation that ϕaB and ϕAB are given by their fixed-point values for almost all the time that the Ab lineage is drifting. Doing so, Equation B12 becomesMath(B19)We can now plug the values from Equation B17 into Equation B19 and evaluate the expectation to find pAb, checking that the dominant trajectories are indeed changing slowly compared to λ+ and Math.

Evaluating the probability that a single-mutant lineage will be successful:

Unfortunately, there is no simple general expression for Equation B19; instead, we consider the different asymptotic forms, as in Equations B17 and B18. To reduce the number of different parameter regimes that we must consider, we now assume that the two loci are symmetric, with μA = μB = μ and δA = δB = δ. (The generalization to the asymmetric case is straightforward—see appendix c.) We then have paB =pAbp, and Equation B5 for the probability of having crossed the valley at long times becomesMathwhere τ is the expected time to cross the valley, and τ ≈ (2Nμp)−1, with p given by Equation B19.

We now proceed to evaluate Equation B16 and find expressions for the approximate rate of valley crossing, τ−1. Even with the simplifying assumption of symmetric loci, there are a number of different limiting parameter regimes in which τ−1 takes different forms. (See Figure B1 for the different parameter regimes.) We skip the tedious details and provide only the results of the calculations. The crucial fact that we use is the one mentioned in the main text and described in appendix a: that lineages are exponentially unlikely to drift for a time t ≫ 1/δ and that given that a lineage drifts for a time t, it is exponentially unlikely to have a size n ≫ min{t, 1/δ} for much of that time. Because lineage trajectories that last longer than this or reach larger sizes are so unlikely, and because their contributions to Equation B19 typically grow no faster than ∼nt2 at the most, the trajectories that make the dominant contribution to p for most of parameter space are the ones that last no longer than ∼1/δ generations and reach typical sizes given their lifetime. On the other hand, because the lineage size distribution has a long tail until it reaches these boundaries, large (but not exponentially suppressed) lineages are generally important.

Asexual regime:

First note that for very small r, the expressions for Math in the first and second lines of Equation B17 go to 0, and the integrand in Equation B19 is dominated by NMath:Math(B20)In this case, double mutants are more likely to be produced by mutation of single mutants than by recombination between single mutants, and the population is effectively asexual. We considered this case in Weissman et al. (2009). The rate of valley crossing will be given by the asexual rate, Math, namelyMath(B21)Equation B21 holds in moderately large populations, with Math; for smaller populations, the dominant valley-crossing process is for one mutant genotype to drift to fixation before the double mutant is produced. (In this small-population regime, because the two loci are essentially never simultaneously polymorphic, recombination has no effect; see Figure 5.)

Infrequent recombination (rs):

Now we consider larger r (but still ≪s). First, we verify that our approximation ϕaBMath, ϕABMath is correct. We give the typical lifetime T for the trajectories that dominate p, along with the corresponding timescale λ−1Math + Math for ϕaB and ϕAB to approach their fixed point value:Math(B22)In the first two regimes, λ−1T and our guess is consistent, but for sufficiently deleterious single mutants it fails. In this last case, selection limits both nAb and naB to ∼1/δ, so valley crossing typically occurs via two similarly sized lineages and the analysis of this appendix fails. Fortunately, in this case selection limits the size of fluctuations in the numbers of single mutants, and nAb and naB can be approximated by their expectations. This actually allows the rate of valley crossing to be calculated far more easily. We quote only the result here (see Equation B23 below); for a detailed analysis, see Hadany (2003). (It is interesting to note that even in this case where our approximations fail, they overestimate the rate of valley crossing only by a factor of 2.)

Evaluating Equation B19 in the regimes where λ−1T, and combining this with the large-δ result and Equation B21, we finally have the rate of valley crossing for rs:Math(B23)(see Figure B1). In the first line, we see that for sufficiently small r, the population is effectively asexual, and the rate of valley crossing is given by Equation B21. For recombination to effectively speed up valley crossing, there must be a significant probability that Ab and aB individuals are simultaneously present (N and μ not too small, δ not too large) and the asexual valley-crossing process must not be too easy (s not too large). Given these conditions, recombination will be the dominant means of producing double mutants, and the population will fall into one of the three remaining parameter regimes in Equation B23, depending on the size of δ. In the “effectively neutral” parameter regime, selection is sufficiently weak that the most likely way for the population to cross the valley is for one single-mutant genotype, e.g., Ab, to drift to such a high number nAb ≈ (Nμ2rs)−1/3 that success becomes likely (Math); all the luck is concentrated in this initial large lineage. In the “slightly deleterious” regime, selection makes it unlikely that a lineage will reach this size. Instead, the most likely way for the population to cross the valley is for one lucky lineage to drift to size nAb ∼1/δ, during which time another lucky lineage drifts to size Math, at which point success becomes likely. At even higher values of δ, in the “moderately deleterious” regime, both single-mutant lineages are limited to size ∼1/δ, and the double mutants that they generate must get lucky to escape drift and take over the population.

Frequent recombination (rs):

Now we turn to the case rs. Note that since MathMath in this case and nAbN by assumption, the integrand in Equation B19 is dominated by the first term, and p is approximatelyMath(B24)where θ is the Heaviside step function. Note that for δ = 0, this is just Equation B20, the probability of success of a single-mutant lineage in an asexual population. If Math, then the extra terms in Equation B24 do not affect the main contributions of the trajectories that dominate the asexual case, and τr>s ≈ τr=0. (It is easy to check that in these cases the condition Ns ≫ 1 implies that the timescale λ−1 in Equation B18 is small compared to the timescale for the Ab lineage dynamics, so our approximations are consistent.)

On the other hand, if we have Math, then only the exponentially unlikely trajectories with nAb ≪ 1/δ contribute to p. In this case, because the integrand is positive only for nAb > Nδ/s, we can see from Equation A3 in appendix a that p = 𝒪(exp(−Nδ2/s)), and thus log τ ≈ Nδ2/s. Note that this agrees with the result of Barton and Rouhani (1987) quoted in the main text, although it was obtained via a very different method. Note also that p is tiny compared to its values in other parameter regimes; this is why we can generally ignore unlikely trajectories that last much longer than ∼1/δ or reach disproportionate sizes—their contribution to p is negligible as long as the nonsuppressed lineages can be successful.

Putting these results together, we find that the rate of valley crossing for rs is given approximately byMath(B25)See Figure B1 for a typical slice of parameter space. We see that as long as the single mutants are not too deleterious, the two effects of recombination on valley crossing (bringing together mutant alleles and breaking up the beneficial combination) roughly cancel. In contrast, for deeper valleys, frequent recombination enormously decreases the rate of valley crossing. Note that the arguments of this appendix are not sufficient to determine the prefactor in τr>s given in Equation 8; this requires the detailed calculation in Barton and Rouhani (1987).

Figure B1.—

Typical slice of parameter space for Formula, showing regions where recombination has a strong effect on τ, the expected time for the population to cross the fitness valley. Recombination reduces τ in the green region, increases τ in the red region, and has only a minor effect on τ elsewhere. In the regions labeled A1 and A2, τ is given by the first and second lines of Equation B21 (the first and third lines of Equation 1 in the main text), respectively. In regions L1, L2, and L3, τ is given by the second, third, and fourth lines of Equation B23 (Equation 4 in the main text), respectively. In region H2, τ is given by Equation 8. In region H3, τ is approximately the expected time for a deleterious single mutant to drift to fixation, with log τ ≈ Nδ. In regions A1 and L1, single mutants are effectively neutral.

Assumptions and consistency checks:

It is worth reviewing the assumptions required in the lengthy and admittedly somewhat opaque derivation above. The most important assumption is that successful Ab lineages typically do not follow exponentially unlikely trajectories. This is because the probability of success for a lineage typically grows much slower than exponentially with the lineage size. It is straightforward to obtain rough estimates of the contributions of the exponentially unlikely trajectories to the overall probability of success and confirm that they are small. We have done so for the one regime (Math) where the probability of success increases faster than exponentially in the lineage size and where the exponentially unlikely lineages are therefore important. Note that this assumption is only for the initial, large Ab lineage; we do not make a similar assumption for the aB and AB lineages. Rather, we start with the full distribution of possible trajectories and find that the exponentially unlikely trajectories are unimportant, verifying this part of the heuristic argument in the main text.

We have made the additional assumption that the Ab lineage is relatively unaffected by the aB and AB lineages until the B allele is already increasing deterministically. As a check on this assumption, one can check that for rs the aB lineage dynamics are relatively unaffected by the much larger Ab lineage. While the two lineages must obviously interact to produce recombinants, the total number of individuals affected (≲1/s) is typically small compared not just to nAb, but also to naB. For rs, one can check that the B-allele dynamics are only slightly affected by the more common A allele until xAxB ≈ 1/s (at which point the mutants begin to deterministically increase in frequency), and thus the B allele has only a slight effect on the A dynamics before this point.

The remaining assumptions (e.g., that the aB and AB dynamics are fast compared to the Ab dynamics) are justified within the derivation, and the (numerous) remaining approximations have effects that are small in the asymptotic parameter regimes and straightforward to estimate.

Larger population sizes:

Equations B23 and B25 are valid only for Nμ ≪ 1. For Nμ ≫ 1, there will typically be many single-mutant lineages in the population at one time, and the single-mutant frequencies initially increase approximately deterministically. For rs, their numbers are given by (assuming symmetric loci)Math(B26)As long as N is not so large that the successful double mutant is produced instantly (Nμ2/s ≪ 1), τ is dominated by the waiting time for the first successful double mutant to be produced. Double mutants are produced by recombination at a rate Math and by mutation at a rate μ(nAb(t) + naB(t)), so the expected waiting time for the first successful one isMath(B27)For small r or large δ, the population is effectively asexual; this regime is discussed in Weissman et al. (2009). For a more detailed discussion of the the third case, where recombination dominates and single mutants are effectively neutral, see Christiansen et al. (1998). The fourth case is discussed in Hadany (2003). Note that for deleterious single mutants, τ has the same value as it does for deleterious single mutants with Nμ ≪ 1 in Equations B21 and B23. In the neutral cases, on the other hand, not only is the expected time τ different, but also the rate of valley-crossing increases with time, so that the distribution of the valley-crossing time is no longer exponential.

For rs, we can consider the population to have crossed the valley when the mean fitness of all the mutants is significantly greater than that of the wild type; once this is true, selection will rapidly drive the mutants to high frequencies. For Math, single mutants are effectively neutral, and selection will begin to favor the mutants when nAB ≈ 1/s. This typically occurs at time Math, approximately the same as for an asexual population. (Note that τ ≫ 1/r, so the population will be close to linkage equilibrium at t ≈ τ.) For Math, the population cannot cross the valley until the single mutants drift far higher than their frequencies at mutation–selection balance, so the semideterministic approximation breaks down; this regime is discussed in the main text and in Barton and Rouhani (1987). Note that in this case, while there are many single-mutant lineages in the population at one time, only rare lineages drift to the frequencies needed to cross the valley, so the exponential dependence of τ on Nδ2/s is the same as for Nμ ≪ 1.


For most of this article, we have assumed that the two loci are equivalent to reduce the number of parameters. Relaxing this assumption, i.e., allowing μA ≠ μB and δA ≠ δB, does not change the qualitative conclusions. It is straightforward to repeat the heuristic arguments in the main text or the calculations in appendix b for any particular choice of parameters, although it is tedious to do so for all possible parameter combinations. Two points are worth briefly noting. First, asymmetry between the loci generally reduces the advantage of rs and increases the disadvantage of rs relative to asexual populations. This is because the asexual valley-crossing dynamics will be dominated by the more favorable single-mutant genotype, while the sexual dynamics will necessarily involve both genotypes. Second, for rs and very asymmetric loci, it is possible to be in the regime where valley crossing is exponentially suppressed even if one of the single mutants is nearly neutral. For instance, suppose δA is very small. As long as δB is large enough that the product satisfies δAδB ≫ max{μAs, μBs, s/N}, the population will be in the regime discussed in the main text, where to cross the valley it must cross a saddle point in the equilibrium allele frequency probability distribution. The saddle point in this case is at xA ≈ δB/s, xB ≈ δA/s, with Ab typically drifting to a much higher frequency than aB. The expected valley-crossing time increases exponentially with the population size, withMath


We evolved populations using time steps of dt = 10−2 generations, during each of which the state of the population was updated via the following procedure:

  1. The mean fitness Math of the population was calculated, and the number nij of individuals with genotype ij was changed by random numbers of births Math, deaths dij ∼ Binomial(nij, dt), and mutations mij ∼ δiABinomial(naj, μAdt) + δjBBinomial(nib, μBdt) (where δkl is the Kronecker delta): Math.

  2. If the total number of individuals Math was different from N, the number of individuals of each genotype was multiplied by N/N* and rounded to the nearest integer, and then the number of individuals with the most common genotype was adjusted to bring the total back to N.

  3. A random number of pairs of individuals with complementary genotypes (abAB or AbaB) underwent recombination. For each pair of genotypes, the number of recombinations was distributed as Binomial(nij, rni′j′dt/N), with nijni′j′.

Each run of the simulation proceeded until the population was fixed for AB or 2 × 107 generations had passed. Note that the simulation results for τ include the time it takes for AB to sweep through the population. This has a negligible effect, except for the smallest population sizes shown in Figure 5; in this case, we also included the sweep time in the curves showing the analytical predictions.


We thank Michael Desai for many ideas and discussions and are grateful to Joanna Masel and an anonymous reviewer for their helpful suggestions. This work was supported in part by a Robert N. Noyce Stanford Graduate Fellowship and European Research Council grant 250152 (to D.B.W.) and by National Institutes of Health grant GM 28016 (to M.W.F.).


  • Communicating editor: L. M. Wahl

  • Received June 22, 2010.
  • Accepted September 27, 2010.


View Abstract