## Abstract

Adaptation depends critically on the effects of new mutations and their dependency on the genetic background in which they occur. These two factors can be summarized by the fitness landscape. However, it would require testing all mutations in all backgrounds, making the definition and analysis of fitness landscapes mostly inaccessible. Instead of postulating a particular fitness landscape, we address this problem by considering general classes of landscapes and calculating an upper limit for the time it takes for a population to reach a fitness peak, circumventing the need to have full knowledge about the fitness landscape. We analyze populations in the weak-mutation regime and characterize the conditions that enable them to quickly reach the fitness peak as a function of the number of sites under selection. We show that for additive landscapes there is a critical selection strength enabling populations to reach high-fitness genotypes, regardless of the distribution of effects. This threshold scales with the number of sites under selection, effectively setting a limit to adaptation, and results from the inevitable increase in deleterious mutational pressure as the population adapts in a space of discrete genotypes. Furthermore, we show that for the class of all unimodal landscapes this condition is sufficient but not necessary for rapid adaptation, as in some highly epistatic landscapes the critical strength does not depend on the number of sites under selection; effectively removing this barrier to adaptation.

THE question of how long it takes for a natural population to evolve complex adaptations has fascinated researchers for decades (Haldane 1957; Kimura 1961; Grant and Flake 1974; Valiant 2013). The evolution of populations can be seen as an adaptive walk across the “mutational landscape” (Gillespie 1984), the space of all possible genotypes. The speed of adaptation critically depends on how the fitness values of all genotypes are organized in this space. In particular, it depends on the number and shape of the paths leading to the optimum on this landscape. This raises both empirical and theoretical difficulties for the study of the speed of adaptation. Empirically, measuring the fitness of every possible genotype is virtually impossible. For this reason, most empirical studies focused on distributions of effects of single mutants (Eyre-Walker and Keightley 2007). However, organisms are not just the sum of their genes: gene interactions (epistasis) are pervasive and the effects of mutations will change depending on the background in which they occur (Phillips 2008). The difficulty of measuring mutational effects across multiple backgrounds grows combinatorially with the length of the genotype, and most studies are restricted to studying the effects of interactions in a local neighborhood of some genotype. In part because of this lack of knowledge about the structure of the fitness landscape, and in part due to the added difficulty of analyzing correlated landscapes, most theoretical studies have focused on landscapes in which either the fitness of genotypes (Gillespie 1983, 1984; Kauffman and Levin 1987; Orr 2002) or the effects of new mutations (Wilke 2004; Desai *et al.* 2007; Fogle *et al.* 2008) are drawn from a random distribution. The first case, adaptation on random landscapes, leads to extremely short adaptive walks and may be realistic only when the population is very close to a fitness peak (Orr 2006). In the second case, adaptation in linear landscapes, such as when the effects of mutations are drawn from a random distribution, ignores potential correlations between mutational neighborhoods and any kind of interaction between mutations.

Most studies on the speed of adaptation have focused on the limits imposed by competition between multiple beneficial mutations (Gerrish and Lenski 1998). Because of this, most models assume that populations evolve in a continuous space under a never-ending supply of beneficial mutations (Orr 2000; Wilke 2004; but see Kim and Orr 2005 for a model of a finite genome), when in reality the stage in which evolution proceeds is comprised of discrete genotypes. This fact results in a number of new and important features for the dynamics of adaptation. First, in a discrete space of genotypes, the supply of new beneficial mutations naturally decreases as adaptation occurs as a consequence of the finite size of the genome. Second, and consequently, as the population becomes more adapted, the potential for deleterious mutations increases as more and more sites become adapted. Models analyzing adaptive walks typically assume that the population or selection strength are large enough such that the probability of fixation of deleterious mutations is zero, effectively disregarding the growing difficulty of maintaining the acquired adaptations. Finally, fitness landscapes can display strong correlations between mutational neighborhoods, making the effects of new mutations not necessarily constant across the fitness landscape nor simply drawn from a random distribution. Previous attempts at analyzing the speed of adaptation in correlated neighborhoods (Kryazhimskiy *et al.* 2009) assumed an infinite supply of beneficial mutations and strong selection, disregarding the growing difficulty of finding new beneficial mutations and maintaining previously acquired ones. As we will show, these effects impose strong constraints to adaptation.

Other studies have focused on the properties of adaptive walks, which explicitly consider the discrete nature of the genotype space (Kauffman and Levin 1987; Orr 2002; Park *et al.* 2016). Many of these studies have focused on models of fitness landscapes that can display high levels of ruggedness, such as the house-of-cards model (Kingman 1978), in which *fitness values* are drawn randomly from some distribution; the rough Mount Fuji (Aita *et al.* 2000), in which *fitness effects*, combined with a deterministic part of fitness, are drawn randomly; or the NK model, in which the fitness effect of a locus depends, in some randomly prescribed way, on the state of *K* other loci (Kauffman and Weinberger 1989). Both of these classes of models lead to landscapes exhibiting multiple peaks. For this reason, these studies have focused mainly on the length of the adaptive walk, the number of substitutions that occurs before the process reaches a local peak, and how this depends on the number of local peaks in the landscape. Even though this is an empirically measurable quantity, it does not directly address the question of how long a population takes to reach this peak and how this depends on the shape of the paths leading up to it. Note that the number of substitutions is not equivalent to the time it takes to reach a peak: new mutations, even if beneficial, can be lost, and deleterious mutations can be fixed. Here, we directly address this question by asking how much time a population requires to reach a fitness peak.

To do this, instead of considering the rate of adaptation in specific fitness landscapes, which may not be informative of real trajectories since their details are unknowable; we consider *classes* of fitness landscapes, including many patterns of gene interactions, and focus on upper bounds for the time to reach a fitness peak. We focus on traits encoded by many genes and study how this time depends on the number of sites under selection. We argue that the scaling of this time with the length of the target sequence quantifies the complexity or “hardness” for a natural population to perform an adaptive walk on a class of landscapes. Similar to previous approaches (Gillespie 1983, 1984; Orr 2002, 2005, 2006) we consider a monomorphic population in the weak-mutation regime. However, to address the difficulties outlined above, we consider that this population evolves in a sequence space and under the combined action of mutation, selection, and drift; allowing for the possibility that deleterious mutations are fixed.

To analyze the dynamic properties of the adaptive trajectory, we take advantage of tools commonly used in the theory of randomized and evolutionary algorithms (Paixão *et al.* 2015b). Using these tools, we first calculate an upper bound for the time to reach an adaptive peak in a simple landscape with equal, additive contributions of all sites (loci) as a function of the number of such sites contributing to the trait. We focus on the crucial distinction between a polynomial and an exponential scaling of this time with the number of sites under selection, and argue that these two qualitatively distinct regimes correspond to situations in which adaptation is “efficient” or “inefficient,” respectively. We find conditions on selection strength that separate these two regimes, and show that populations in the weak-mutation regime (WM) can adapt efficiently, but the critical selection strength grows with the number of sites under selection, effectively setting a limit to adaptation. We generalize these results to a large family of fitness landscapes that includes very general forms of interactions between the sites under selection, only excluding forms of interactions that create multiple fitness peaks. We derive an upper limit to the time to reach a fitness peak, setting a speed limit to adaptation in these landscapes. Finally, we analyze in detail one instance of this class, an extreme form of epistasis in which mutations need to be accumulated in a particular order. We show that in this case, despite a slower speed of adaptation, the critical selection strength enabling efficient adaptation does not depend on the number of sites under selection; eliminating the limits to adaptation previously identified for simpler landscapes.

## Methods

### Transition probabilities

To investigate the speed of adaptation we assume the weak-mutation regime. In this regime, a new mutation is either lost or fixed in the population, replacing the previous genotype before any other mutation arises in the population. We assume that the genotype **x** is composed of *n* biallelic loci or sites and consider a trait which is a function of the genotypic sequence under constant selection gradient *β* such that fitness is The number of adapted sides in each genotype is denoted *x*.

In our model, at each iteration exactly one mutation occurs, which can be either beneficial with probability harmful with probability or neutral with the remaining probability These probabilities depend on the current genotype and the number of adapted sites *x*, and thus may change during the course of adaptation. Note that one iteration in our model does not correspond to a biological generation, but rather represents one mutation event (which takes on the order of generations to occur, where *U* is the genomic mutation rate).

A mutation is fixed or lost according to Kimura’s probability of fixation (Kimura 1962),(1)which depends on both the population size (*N*) and the fitness difference to the resident genotype (the selection coefficient in the traditional formulation, and allows for the fixation of deleterious mutations. This model is obtained as a limit of many other models, such as the Wright–Fisher model or the Moran model, and was previously introduced in other contexts (Berg *et al.* 2004; Sella and Hirsh 2005; Tuğrul *et al.* 2015). This model is valid as long as the time for a mutation to be either fixed or lost is short compared to the time between mutations This will always depend on the population size (*N*) and on the minimum absolute selection coefficient in the landscape.

### Fitness landscapes

We start our analysis with a simple additive fitness landscape, in which all mutations have the same effect on the trait (and consequently on fitness). Fitness is formalized by the function which counts the number of correct matches (*x*) in a genome of length *n*.

We then generalize to all additive fitness landscapes by relaxing the condition of equal contributions. Fitness is defined as where each site contributes a weight to the trait, such that

Finally, we generalize our analysis even further and include all functions with a single maximum: unimodal fitness function. These functions allow arbitrary forms of epistasis, only excluding some types of reciprocal-sign epistasis which may lead to multiple peaks (Weinreich *et al.* 2005; Poelwijk *et al.* 2007). In particular, it excludes reciprocal-sign epistasis that occurs when the sign of the effect of a substitution depends on the background in which it occurs, and may lead to multiple peaks (see Poelwijk *et al.* 2011 for the necessary conditions and Crona *et al.* 2013 for the sufficient conditions for multiple peaks). We analyze in detail one instance of this class exhibiting an extreme form of epistasis, defined as This function requires mutations to be accumulated in a particular order. See Figure 1 for illustration of used fitness functions.

### Drift analysis

To estimate the time that a population needs to find the fitness peak and its dependence on the number of genes *n*, we employ tools from theoretical computer science, in particular the so-called drift analysis (He and Yao 2001; Lehre and Witt 2013). In this context, drift refers to the expected progress of a population toward the fitness peak and is not to be confused with *genetic drift*, as traditionally used in population genetics.

Drift—the expected progress of a population toward the fitness peak in one time step—is usually denoted by and can be calculated as the sum of the expected forward progress (forward drift: the product of the probability of occurrence and fixation of beneficial mutations with their effect) and the expected negative progress (negative drift: the same but for deleterious mutations). In our analysis, we express drift in terms of number of mutations (or states) that the population has to accumulate on its path toward the optimum.

The intuitive idea behind *drift analysis* is simple: it starts by underestimating (*i.e.*, obtaining a lower bound for) the minimum expected progress toward some target state at every genotype. Then, given an initial distance to the target state, which can be pessimistically estimated as the maximum distance, one calculates an overestimation (*i.e.*, an upper bound) of the expected time to reach this state. This is analogous to integrating a differential equation to obtain the time to reach a particular state. However, these methods are tailored to stochastic processes and can be used even for non-Markovian processes (although here we do not make use of this fact). The main advantage of these methods over more traditional Markov-chain techniques is that these allow for simple expressions for the expected time to reach some state. Traditional Markov-chain techniques can be used to this end, but they typically produce unwieldy expressions which allow for little analytical insight into the parameters that affect the earliest time to reach some state. The techniques we use here make use of controlled simplifications to the expectation of progress of the stochastic process to produce simple, but rigorous, bounds on this time (Appendix C).

Drift theorems use upper or lower bounds on the net expectation of progress, to obtain bounds on the time to reach particular genotypes (Appendix C). In our analysis we use its two specific instances: *variable* and *negative drift theorems*. The variable drift theorem (Johannsen 2010) can be applied when, for any state of the system *x*, the expected change between two consecutive states is at least some positive nonincreasing function of the current state In such a case, the variable drift theorem (generalized from Johannsen 2010; see Appendix C) states that the expected time until the state with distance less than *a* from the target sequence is reached, starting at an initial distance of is(2)Note that the variable drift theorem is applied to the decreasing distance to the optimum and has to be expressed accordingly in terms of decreasing number of states that have to be crossed (*i.e.*, number of mutations necessary to reach the fitness peak). The upper integral boundary is pessimistically given by the longest path of strictly increasing fitness leading to the optimum, *i.e.*, the maximum number of mutations that the population has to accumulate to reach the fitness peak. Using this theorem, we can calculate an upper bound on the time to reach any distance *a* to the optimum (lower integral boundary). By setting we can calculate an upper bound for reaching the optimum.

Conversely, the negative drift theorem (Oliveto and Witt 2011; Rowe and Sudholt 2014) can be applied when the expected change between two consecutive states is negative for all states within a given interval, *i.e.*, the population is expected to move away from the fitness peak in some region of the state space. The negative drift theorem (Oliveto and Witt 2011; Rowe and Sudholt 2014) states the conditions on the size of this interval and on the transition probabilities that lead to an exponential time to reach the optimum. Specifically, if the transition probabilities show an exponential decay in the jump length, the time for crossing this interval is exponential in the length of the interval, with overwhelming probability. The exact statement is given in Appendix C. To express these scalings, we use asymptotic notation as explained in Cormen *et al.* (2009).

### Simulations

All simulations were initialized from the genotypic sequence, and parameters *N* and *β* were kept constant throughout the run, unless stated otherwise. At every iteration of a run, one site was chosen uniformly at random to mutate, changing its value to The fitness difference of the resulting genotype to the resident genotype is evaluated and Equation 1 is used to compute the probability that it replaces the resident genotype. We ran this cycle until either the fittest genotype is fixed, some fraction of the maximum fitness is reached, or some threshold number of iterations is reached (6 × 10^{4}, Figure 2B; or 10^{8}, Figure 4).

### Data availability

The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article. Code to perform simulations is available upon request.

## Results

In this manuscript, we investigate how the time to adaptation varies with the number of sites under selection for several classes of fitness landscapes corresponding to different choices of the trait function It should be noted that the time we refer to here is measured in number of mutations that are “tried” before the target genotype is reached, and so it is measured in units of mutation rate where *U* is the genomic mutation rate.

We start by showing that on a simple landscape, in which all mutations have the same effect on the trait (and subsequently on fitness), there is a critical selection strength that allows populations to efficiently reach or approach the fitness peak. This threshold grows with the number of sites under selection, effectively setting a limit to the number of sites that can be adapted under constant selection. We then generalize our results to general additive landscapes, independent of the distribution of mutational effects. Next, we show that for the class of all landscapes with a single peak, which includes very general forms of gene interactions, this critical threshold is sufficient, but not necessary, to obtain an upper bound on the time to reach the fitness peak. We demonstrate that there are landscapes for which a constant selection strength allows efficient adaptation of arbitrary numbers of sites.

### Adaptation time in simple additive landscapes

One of the simplest scenarios for adaptation is when all sites—genes or loci—contribute equally to fitness. This leads to a fitness landscape where the fitness of a genotype depends only on the number of correct matches to a target sequence. We formalize this scenario by the function which counts the number of correct matches (*x*) in a genome of length *n* (Figure D1). This function induces a structure in sequence space in which the fraction of beneficial mutations decreases linearly as a function of the distance to the optimum. We use this function to determine under which conditions populations can efficiently climb simple fitness peaks.

For each new mutation, the probability that it is beneficial depends only on the number of beneficial mutations already fixed (*x*), and therefore the expectation of increase in fitness (forward drift) is The probability of occurrence of a deleterious mutation grows with the number of beneficial mutations already fixed, and thus the negative drift is Therefore, the net expectation of progress is:(3)(see Appendix B, Lemma 2). This expectation is always positive as long as for some constant This condition states that, for the expectation of progress to be always positive, the selection differential needs to be large enough to counteract the deleterious mutation pressure in the worst possible case [which occurs at a genotype which is one mutation away from the optimum, when and so and If this condition is met we can write:(4)We can now apply Equation 2 (see *Methods*; Johannsen 2010) to the decreasing number of zeros (number of remaining mutations that need to be accumulated), to obtain an upper bound on the expected time to reach the fitness peak:where the maximum number of mutations that are required to reach the fitness peak is *n*.

This expression quantifies the impact of the length of the target sequence on the time (in units of mutation rate) to attain it. It shows that the time required to evolve adaptations involving larger numbers of sites will simply require a polynomial number of extra mutational “trials” (Figure 2A).

### A critical threshold for efficient adaptation

Our analysis above shows that for a population in the WM to be able to reach the fitness peak efficiently, it is sufficient that selection strength grows logarithmically with the number of sites under selection We next show that if selection strength is below this threshold, these populations cannot efficiently find the optimum, as the time required to reach the optimum on becomes exponential in *n* with overwhelming probability. Populations in the WM therefore exhibit a phase transition behavior: changing by a constant factor leads to a difference between polynomial and exponential expected time to reach the optimum on

To show this, we consider a genotype some distance away from the optimum, for some small positive *ε*. At this point, the fraction of mutations that are beneficial becomes Correspondingly, the fraction of deleterious mutations is Now, if selection strength is between we can bound to obtain the probabilities of fixation of beneficial or deleterious mutations and respectively. Substituting in the net expectation of progress (Equation 3) we obtain:where *c* is a positive constant. This means that, if selection strength is between then, as the population approaches the optimum, there will be a region where the expectation of progress is negative. This happens because selection is not strong enough to counteract the deleterious mutation pressure that has built up. We can then apply the negative drift theorem to the number of zeros on an interval of and show that the expected time to reach the peak is exponential in the number of loci (see Appendix D for details). This shows that if selection strength is below more complex adaptations, involving a larger number of sites, will take exponentially longer to evolve (Figure 2B).

This result sets a limit to the complexity that can be evolved: for a fixed selection strength, there is a maximum number of sites that can be efficiently adapted. Typically, selection is deemed efficient when (corresponding to *Nβ* in our framework). Our result defines the conditions for which selection is efficient in a multilocus setting, taking mutational pressure into account. It shows that even if at every locus, for selection to be able to drive a population to the fitness peak, needs to scale nonlinearly with the length of the target sequence

### Efficient approach to the optimum

The results above show that the time required to reach the optimum scales nonlinearly with the number of sites under selection. However, it can be argued that populations do not have to reach the optimum, they only need to get sufficiently close.

Using Equation 4 together with Equation 2, we show that the population can reach a genotype in which at least sites are well adapted, where is the number of mutations to the optimum. The population reaches such a genotype in:(5)This means that the time to reach a genotype with a *constant fraction* of well-adapted sites (for example, at which 99% of sites are adapted, *a* = 0.01*n*) is linear on the length of the target sequence. This is a significant improvement over the time to actually reach the fitness peak, showing that this time is dominated by the last few steps. It should be noted that the time to reach any *constant distance* from the optimum (say *n* − *a*, with *a* constant) is of the form

### General additive fitness landscapes

We now generalize the previous results to linear landscapes *regardless* of their distribution of mutational effects. When all mutations contribute equally to the trait, it is sufficient that selection strength is such that for the population to be able to reach the fitness peak in polynomial time. More generally, if each site contributes a weight to the trait, such that and for a certain selection strength, there will be a critical weight such that all sites of weight will be able to be reached in polynomial time, reaching a fitness of at least

Analogously to the equal-effects case (Equation 3), we can write the net expectation of progress on these “large effect” sites:This expression is positive on as long as for some constant which determines the critical threshold: This leads to the lower bound on the expectation of progress:As before, we can use Equation 2 to obtain an upper bound for the expected time to reach fitness at least (see Appendix E, Equation E2):Since the sites of large effect behave essentially like the equal-effects case, for a constant selection strength there is a maximum fitness that can be reached in Reaching a fraction of this fitness takes linear time (Equation 5); while adapting further requires exponential time, which we confirmed with simulations (Figure 3). Without knowledge of the actual distribution of effects, it is impossible to determine and hence the fitness level that is guaranteed to be reached in polynomial time. However, since all effects are drawn from the same distribution, will always be a constant fraction of *n* [since *n** is simply the fraction of weights below *w**, These scalings are valid for *any* distribution of effects and represent hard limits on this class of fitness functions.

### Adaptation in a general class of landscapes

We now turn to a general class of fitness landscapes: unimodal functions. This class includes all functions that have only one maximum; meaning that it includes functions displaying arbitrary forms of epistasis, only excluding some types of sign epistasis which may lead to multiple peaks (Weinreich *et al.* 2005; Poelwijk *et al.* 2007), as mentioned before.

The defining feature of the members of the unimodal class is that any genotype other than the peak has at least one mutational neighbor (a genotype that differs exactly by one mutation) of higher fitness value. We denote the minimum of these trait increases (or decreases) in the landscape by *δ*. Because each genotype necessarily has at least one neighbor that increases the trait value by at least *δ* we can bound the expectation of improvement by In this class of functions, there are potentially deleterious mutations, each contributing to the total backward expectation. If the population size is we can bound (see Appendix B, Lemma 3; and Appendix F), which implies that decreases exponentially for deleterious mutations, and the worst case of these mutations is actually when yielding a backward expectation of and a total expectation of improvement:(6)This net expectation of progress is positive as long as for some constant Therefore, for some constant this expectation then becomes simply:(7)We can then use Equation 2 with the maximum and minimum fitness differences (d and *δ*, respectively) as integral limits to calculate an upper bound for all functions in this class. Note that in this case we are applying the drift analysis with respect to the fitness rather than to the number of one-bits in the trait:(8)This bound depends on the length *d*, and as such is not independent of the instance of the function class we are considering. It should be noted that the upper bound of Equation 8 can be loose, as can be seen by comparing to the previous results for linear function (which are part of the unimodal function class): the fitness range *d* is of size *n*, entailing a bound for the time to adaptation of when, in reality, the time on the linear function class grows slower Moreover, this bound does not guarantee that the time to reach the peak is polynomial: there could exist members of the unimodal function class for which is exponential; *e.g.*, when *d* is constant but the Hamming path leading to the optimum is exponential, then *δ* will be exponentially small (Rudolph 1997b; Droste *et al.* 2002), making the bound exponential. Next, we focus on one particular member of this function class for which this bound is tight.

One extreme form of epistatic landscape is when mutations need to be accumulated in a particular order, having no effect outside of this order (Kondrashov and Kondrashov 2001). This creates a landscape in sequence space characterized by a fitness ridge and vast neutral plateaus leading to the optimum (Figure 4A). We formalize this landscape by the function which counts the number of leading ones in a bit string (Rudolph 1997a). To increase its current fitness, it is necessary to flip the first zero in the genome to one. Flipping any other zero to one will result in a mutant offspring with the same fitness as its parent, while flipping any of the leading ones into zero can result in a drastic fitness loss. In this landscape, the fitness range *d* has size *n* (see Figure 4A), which, according to the bound from Equation 6, leads to a time of We now show that this bound is tight.

In this landscape, the probability of a beneficial mutation is as only flipping the first zero in the genome will result in a fitness increase. However, as more ones can follow this locus (neutral mutations that may have fixed neutrally), the increase in trait value can be higher than one. This means that we can bound the expectation of forward progress by Mutating the *j*-th position of the *x* already well-adapted sites will result in a fitness decrease of size yielding: However, as long as the fixation probability decreases exponentially for deleterious mutations and can overcome the linear impact *k* of mutation. Specifically, we can bound each (see Appendix B, Lemma 3; and Appendix F) and, using and the fact that we can write for the net expectation of progress:Since and then the expectation of progress is always positive and reduces toWe can use Equation 2 to obtain an upper bound on the expected time to reach the fitness peak:This shows that even if the path to the optimum is narrow and mutations have to occur in a specific order, populations in the WM are able to climb the fitness peak relatively fast in polynomial time (Figure 4B). Remarkably, this result holds for any selection strength above a constant value; indicating that, for landscapes of this type, there are no limits to the number of loci that can be adapted in polynomial time, as long as selection strength is above this constant value. The main reason for this is that even though the number of deleterious mutations still increases as the population approaches the optimum, most of them are much less likely to be fixed due to their strong deleterious effects. This leads to a much less pronounced slowdown of the speed of adaptation as the population approaches the optimum. Notice that in this family of landscapes, the time to reach a fraction of the maximum fitness is also

## Discussion

There are at least two ways in which a trait can be considered “complex”: in the number of sites contributing to it, by analogy with complex traits as defined in quantitative genetics; and in the way that it is coded for by the sites that contribute to it, *i.e.*, the complexity of the landscape in which it exists. In this manuscript we address the limits imposed by both of these factors.

We have shown that for a large class of fitness landscapes, it is *sufficient* that selection strength is above a threshold for populations to be able to climb to the fitness peak efficiently. We proved that in the class of additive landscapes, this condition is both sufficient and necessary; implying a limit to the number of sites that can be efficiently adapted at a constant selection strength. Nevertheless, this critical threshold does not seem severe: selection strength should increase logarithmically with the number of sites under selection, indicating that a small increase in selection gradient or population size translates to an exponential increase in the length of the sequences that can be evolved efficiently. Moreover, this condition is not always necessary: when considering a class of epistatic landscapes characterized by a single mutational path of strictly increasing fitness, we found that this limit no longer applies. A constant selection strength will enable a population to climb to the optimum, albeit at a slower rate than in an additive landscape, regardless of the number of sites contributing to the trait. These results quantify the complexity of adaptive walks beyond linear landscapes or uncorrelated mutational neighborhoods. They illustrate how the structure of the fitness landscape can impose limits to adaptation and how these stem directly from how the landscape conditions the distribution of effects of single mutants, in particular of deleterious mutations. Furthermore, they reveal how the buildup of mutational pressure that necessarily counteracts selection imposes a limit on the selection strength required for populations to overcome the entropic effects of mutation and make progress toward fitter genotypes.

Sewall Wright (1932) introduced the concept of fitness landscape mostly as a metaphor for the adaptation of populations, since at the time there was no hope of measuring the fitness associated with each individual genotype. Even then, this metaphor was incredibly successful at shaping evolutionary thought (Provine 2001). It is not surprising then that more recently, with the increased availability of genetic manipulation techniques, this metaphor has been taken seriously and is now the subject of experimental study (see de Visser and Krug 2014 for a recent review). The fitness landscapes of several experimental systems have now been at least partially mapped. Most of these landscape reconstructions have been performed for a small number of genes. For example, Khan *et al.* (2011) have reconstructed the fitness landscape defined by five beneficial mutations that fixed in a long-term evolution experiment. However, new techniques are allowing for the reconstruction of much larger fitness landscapes. For example, Kinney *et al.* (2010) constructed and determined the phenotype of 100s of 1000s of mutants of the *lac*-operon, enabling them to partially reconstruct its expression landscape. Our results inform about the consequences these fitness landscapes can have for the adaptation of populations. They speak not just about the time to reach a fitness peak, but also informs about how quickly mutations are accumulated on the way (by using Equation 2 to calculate the time required to get to a fixed distance to the optimum). If the structure of the landscape is such that many paths lead to the optimum, then the time to fix the next beneficial mutation should increase with (Equation 5), where *x* is the current number of fixed beneficial mutations. On the other hand, if relatively few paths leading to an optimum exist, our results suggest that the time until the next beneficial mutation is fixed is best described by a power law (Appendix F, Figure F2). These results suggest that long-term evolution experiments could be used to identify the class of landscapes on which the population is evolving. In fact, it is interesting to note that the fitness dynamics of a long-term evolution experiment is actually best described by a power law, rather than by a hyperbolic curve (Lenski *et al.* 2015). Even though other explanations are possible, such as a combination of diminishing returns epistasis with clonal interference (Wiser *et al.* 2013); our results show that this could also be explained by a fitness landscape in which fitness effects are highly conditional on the background, such that most deleterious mutations are of large effect and very few paths uphill exist. The existence of extensive diminishing returns epistasis (Khan *et al.* 2011) in this landscape would not be enough to explain this pattern of fitness increase. Whether clonal interference or the existence of many highly deleterious mutations is responsible for this specific pattern of fitness increase could be tested experimentally by closer inspection of the population dynamics and mutational assays of the evolved populations.

The results we show here are related to Fisher’s “cost of complexity” (Fisher 1930). Fisher defined the cost of complexity as the slowdown of adaptation due to the diminishing probability of generating beneficial mutations as the number of traits under selection increases. This has been attributed to the pleiotropic nature of mutations in the geometric model (Wagner *et al.* 2008), since mutations simultaneously affect all traits under selection. Our approach is similar in the sense that we study the dependency of the speed of adaptation on the number of sites under selection. One could think of each site of a genetic sequence as a trait under selection, albeit taking only discrete values, and mutations acting on one trait only (since we consider single mutations only). Our results show that pleiotropy is not the only source for this cost of complexity since, even when mutations act on single “traits,” there is a penalty for having longer sequences. Instead, our results highlight that mutational pressure and the structure of the fitness landscape play an important role on this cost of complexity. This is a direct consequence of the fact that we deal with discrete sequence spaces and not with a continuous trait space, as in traditional formulations of Fisher’s geometric model.

The distinction between polynomial and exponential time is crucial to the question of the evolution of complexity: if only a few mutations need to fix to reach a fitness peak, this distinction is less relevant since the times would be short. This distinction becomes relevant when dealing with complex adaptations involving many sites or genes. Chatterjee *et al.* (2014) investigated the adaptation time in a landscape in which genotypes are assigned one of two possible fitness values (high and low) and no smooth fitness gradients exist. They show that even when the fraction of high-fitness genotypes is large, populations will take at least exponential time to reach one of them. Their results relate directly to the infeasibility of evolving complex *innovations*, adaptations that cannot be reached by gradual steps. However, for many such apparent innovations, paths of gradually increasing fitness actually do exist, such as in the case of *de novo* gene evolution, where duplications and insertions or deletions are believed to pave the way for new genes (Tautz and Domazet-Lošo 2011). Instead, we focused on adaptations for which at least one (potentially tortuous) path exists and show that for these natural selection can efficiently evolve them. For this reason, we have only dealt with constant unimodal landscapes, that is, landscapes in which a single peak exists and that remain constant over evolutionary time (in fact we have provided upper bounds for the adaptation time for all such landscapes). The results we present here, however, allow for insight on how populations climb any peak, regardless of which peak is approached.

Many measures have been proposed to characterize the structure of fitness landscapes (Szendro *et al.* 2013), ranging from the roughness to slope ratio (Aita *et al.* 2001) or correlations of fitness effects (Ferretti *et al.* 2016), to Fourier decompositions of the landscape (Stadler 1996). These are often seen as measures of hardness of a landscape or problem, the intuition being that it will be more difficult for a hill climber to reach a fitness peak in more epistatic landscapes. Under this reasoning, the “easiest” landscape would be one where each locus contributes equally and independently to fitness, such as in the present manuscript. Indeed, this is the case for hill climbers in which mutations are deterministically accepted or rejected based on whether they increase or decrease fitness (Droste 2002), regardless of the magnitude of their effect (random adaptive walks), or in which the best available mutation is always accepted (greedy hill climbers) (Macken and Perelsont 1989; Park *et al.* 2016). However, such adaptive walks may not be adequate models of natural adaptation, except perhaps when the fitness effects of all mutations are extremely large. In fact, we have previously shown that the time to reach a fitness peak can differ significantly between a deterministic hill climber and the model for the evolution of natural populations that we use here (Paixão *et al.* 2015b). This shows that the dynamics of adaptation depend not only on the structure of the fitness landscape but also on the mode of evolution, and suggests that perhaps a more meaningful classification of landscapes should include information about the dynamics of the populations evolving on it, in addition to the landscape’s geometric properties. The results we present here open the door to such classification, at least for the weak-mutation regime.

Crucially, we assume the WM in which mutations are fixed or lost sequentially. This assumes that population sizes or mutation rates are low enough that no new mutations appear before the previous one has either been fixed or lost. When populations are large enough so that several segregating mutations coexist, the time it takes for a single beneficial mutation to fix increases since it necessarily competes with other beneficial mutations (Gerrish and Lenski 1998). However, the rate of adaptation will continue to increase with at least until the infinite population regime is reached, since the time between mutations will decrease faster than the time to fixation of beneficial mutations (Park *et al.* 2010). Thus, the upper bounds for the time to reach a fitness peak should hold, albeit being less tight than for the weak-mutation regime, even for larger populations.

There has been a renewed interest in computational approaches to the theory of evolution (Valiant 2013; Chastain *et al.* 2014). In this manuscript, we have introduced methods developed and commonly used in evolutionary computation for the analysis of randomized algorithms to the evolutionary biology community and show that these can be successfully applied to problems in this field. These methods facilitate the study of adaptive walks on complex fitness landscapes. Such a collaboration between both fields, enabled by the recent development of a unifying framework for evolutionary processes (Paixão *et al.* 2015a), has the potential to shed light on more complex evolutionary processes. For example, similar mathematical tools exist that allow for the analysis of polymorphic populations which could allow for the exploration of the adaptive process beyond the WM in arbitrary fitness landscapes (Corus *et al.* 2014). These results have the potential to illuminate a number of other fundamental limits to adaptation by natural populations.

## Acknowledgments

The authors thank two anonymous reviewers for their insightful comments on a previous version of this manuscript. This project received funding from the European Union’s Seventh Framework Programme for research, technological development, and demonstration under grant agreement 618091 Speed of Adaptation in Population Genetics and Evolutionary Computation.

## Appendix A: Weak-Mutation Regime as an Algorithm

To analyze the rate of adaptation in WM using drift analysis, we apply techniques from the analysis of stochastic processes and randomized algorithms. To this end, we cast WM as a randomized algorithm as follows:

**Algorithm 1.** *WM*. Choose uniformly at random.

**repeat**

*y*←mutate(*x*)

Choose uniformly at random

**if** **then**

*x*←*y*

**end if**

**until** stop

## Appendix B: Probability of Fixation Inequalities

Here we derive the upper and lower bounds for that are used throughout the manuscript. The bounds for show that is roughly proportional to the fitness difference between solutions

**Lemma 1:** **Probability of fixation.** *For every* *and* *the following inequalities hold*. If then

*If* *then*(B2)*Proof*. In the following, we frequently use and for all as well as for

If as well asIf using the fact that Similarly:The next lemma shows that the probability of accepting an improvement of is exponentially bigger than accepting its symmetric fitness variation

**Lemma 2:** **Probability of fixation ratio.** *For every* and (B3)*Proof*.where we have applied the relation

**Lemma 3:** **Exponential decrease of the probability of fixation.** *Let* \{0,1, 2}, *and* *then**Proof*. Using the definition of we can rewrite the statement as:defining and we can simplify the expression tousing the result for the sum of a geometric series yielding:If we extract the first and last term of both sums we obtain(B4)Let us focus now on the left-hand side of the previous equation. Since *x*,*y* > 1, we have that (*xy*)* ^{k}* >

*x*for

^{k}y*k*≥ 1 and thereforeThen, the claim (B4) is proven and so is the lemma’s statement.

## Appendix C: Drift Theorems

The additive drift theorem was first introduced by He and Yao (2001). For simplicity we show the formulation from Lehre and Witt (2013).

**Theorem 4: Additive drift, theorem 1 in Lehre and Witt 2013.** *Let* *be a stochastic process over some bounded state space* *and* *the first hitting time of state* 0. *Assume that* *Then:*

*If**then**If**then*

Both results are conditioned to a starting point but by applying the law of total expectation we can avoid the starting condition obtaining and for the first and second result, respectively. The proof (in Lehre and Witt 2013) mainly makes use of Doob’s optional-stopping theorem that can be found in standard textbooks on martingales (for example in Williams 1991, theorem 10.10).

**Theorem 5:** **Generalized variable drift theorem.** *Consider a stochastic process* *on* *Suppose there is a monotonic increasing function h*: *such that the function* *is integrable on* [1, *m*], *and with expected progress toward the optimum* *such that**for all* * **Then the expected first hitting time of any state from* *for a* *is at most:**Proof*. The following proof is adapted from the proof of Rowe and Sudholt (2014), theorem 1.

LetNote that *g* is strictly monotone increasing and hence invertible. Whenever the random sequence *g*(*X _{t}*) hits state 0, this implies that

*X*has hit a state in Hence, the hitting time of any state is no larger than the first hitting time of the random sequence of the state 0.

_{t}If and then(since 1/*h*(*z*) is positive and monotone decreasing) and if and thenSo, for any So by the additive drift theorem (Theorem 4), the first hitting time of 0 by the sequence is bounded above by *g*(*m*). The result follows.

In the manuscript, we use the negative drift theorem with self-loops presented in Rowe and Sudholt (2014) (an extension of the negative drift theorem by Oliveto and Witt 2011, 2012, to stochastic processes with large self-loop probabilities). It is stated here for the sake of completeness.

**Theorem 6:** **Negative drift with self-loops.** *Consider a Markov process* *on* *and suppose there exists integers a*, *b with* *and* *such that for all* *the expected drift toward 0 is**where * *is the self-loop probability at state k. Further assume there exists constants (i.e.*, *they are independent of m) such that for all and all **where is the transition probability from state k to state l. Let T be the first hitting time of a state at most a*, *starting from Let Then there is a constant c > 0 such that*

## Appendix D: Adaptation Time in Simple Additive Landscapes

In our manuscript, simple peaks are represented by function that assumes that all alleles (bits) contribute to the fitness with weight equal to 1 (Figure D1). Each mutation therefore increases or decreases the fitness by 1.

**Theorem 7:** **Efficiently climbing simple peaks.** *If * *with* *and* * **then the expected optimization time of WM on* *with local mutations is**for every initial search point*.

*Proof*. Let us denote by *x* the number of one-bits. The drift can be expressed as a combination of a forward and a backward driftwhere the forward drift is the probability of mutation flipping a zero-bit multiplied by the probability of accepting such a mutation Note that all mutations in this fitness landscape will change the state *x* by ±1. Analogously, the backward drift is given by the probability of a negative mutation occurring and fixing in the population with probability Therefore, the total expected progress isUsing Lemma 2 we getand since with we can bound from below byTo find the upper bound on the expected time that WM needs to find the fitness peak, we apply the variable drift theorem to the decreasing number of zerosThe number of zeros changes from *n* (in the worst case scenario) to 1 (the last state that is not optimum), defining the boundaries of the integral

Alternatively, we can use bounds (B1) to obtain(D2)**Theorem 8:** **Efficient approach to the optimum.** *If* *with* *and* *then the expected time of WM on* *to first reach a solution quality of at least* *is**for every initial search point*.

*Proof*. The proof is as before, showing that the drift with regards to the number of zeros is at least for search points with *z* zeros, for a positive constant *c*. Then, by applying Theorem 5 to the number of zeros, we get an upper bound ofusing bounds (B1)(D3)**Corollary 9.** *For* *for* *the upper bound from Theorem 8 is* *For* * **e. g*., *a* = 0.001*n*, *we get * *For* *for any constant k* > 0 *we get *

**Theorem 10:** **A critical threshold for hill climbing.** *If* *for some * *then the optimization time of WM with local mutations on* *is at least* *with probability * *for some constant*

*Proof*. To prove this theorem, the negative drift theorem (Theorem 6) will be applied, taking the number of zeros as distance function to the optimum. Our notation refers to numbers of ones for simplicity. Let be the probability that WM will make a transition from a search point with *x* ones, to one with *x* ± 1 ones, and assuming then the expected drift toward the optimum is bounded as followssince On the other hand,using The expected drift is hence at mostNow, the self-loop probability is at least hence the first condition of the drift theorem is satisfied. Since there are only local mutations, the second condition on exponentially decreasing transition probabilities follows immediately. The negative drift theorem, applied to the number of zeros on an interval of proves the claimed result.

## Appendix E: General Additive Fitness Landscapes

General additive fitness landscape is defined by the function where is a weight with which each site contributes to the trait, such that

For we showed that for *c* > 1 is sufficient to get a positive drift. In a more general sense, for a bit of weight *w* we get a positive drift on that bit if Call all such bits *large effect sites* or *heavy*, then, by the same arguments as for WM optimizes all heavy bits in the same time bound as for The only sites we cannot guarantee to fix in polynomial time are those with effect smaller than where defines a threshold on the distribution of effects separating the loci “easily” adapted from the “small effect” ones. The total contribution of these sites is at most

**Theorem 11:** **General additive fitness landscapes.** *Let* *and * *Then*, *WM with* *and* *finds a solution of fitness at least**in expected time at most**where * *is the minimum weight we want to optimize and the number of weights with value less than *

*Proof*. If the statement is trivial as then the lower bound on the fitness is nonpositive.

Without loss of generality, we assume that the weights are ordered in ascending order: Now, when and ignoring the weights such that we lower bound the positive drift by the probability of flipping one of the zero-bits with a weight bigger than times the fixation probability, underestimated for the case where that bit has exactly a weight of Where *x* is the number of one-bits. For the backward drift, we look to the worst expected impact of one single bit then by applying linearity of expectations we obtainThe total expectation of the progress toward the optimum is thereforeUsing Lemma 2 and then introducing we get

Now we apply the variable drift theorem to the number of zeros *z* in the bits that we want to optimize, *i.e.*, which is always positive if . The integral range will go from the farthest point to the optimum (all of the *n* − *n** heaviest weights being zero) to the closest (only one bit of the *n* − *n** heaviest weights being zero)Alternatively, using bounds (B1) we can get an alternative expression

## Appendix F: Adaptation in Unimodal Landscapes

**Theorem 12.** *WM with* \ *and* *with* *can optimize every unimodal function in at most**where are respectively the maximum and minimum fitness difference between any two search points.*

*Proof*. Usually the variable drift theorem is applied over the genotype space, *i.e.*, the Boolean hypercube (or some characteristic of it like the number of ones), however in this proof we will apply it on the phenotypic level, *i.e.*, the fitness function.

Let us denote by *x* any nonoptimal search point. Mutation can only produce points in the Hamming neighborhood, we pessimistically assume that only flipping one of these *n* points leads to an improvement (the remaining *n* − 1 Hamming neighbors will have a worse fitness) and that its size is the minimum possible value then the forward drift can be bounded byFor the backward drift, we consider all the remaining Hamming neighbors and denoting by *g*(*k*) > 0 the absolute fitness difference between the new and the old search point when flipping bit *k* we obtain:Since we can apply Lemma 3 which means that *p*_{fix} decreases exponentially for deleterious mutations. Specifically, we can bound obtainingsince we can introduce yielding

The value of that maximizes is = 1, however, when is not a feasible solution [note that and the maximum will be at *g*(*k*) = δ. Therefore,We can upper bound these two cases by its sum Introducing this back in (F1) yieldsNow we can compute the total driftAnd following the usual steps, applying Lemma 2 and introducing we obtainsince the previous expression is positive and we can stateFinally, we apply the variable drift theorem with integral limits for the biggest fitness difference (*d*) and the minimum (*δ*)using bounds (B1) we obtain an alternative formula

### Climbing Fitness Ridges

The function that counts the number of leading ones in a bit string is defined as To increase current fitness by mutation, it is necessary to flip the leftmost zero-bit to one. Flipping any other zeros to one will result in a mutant offspring with the same fitness as its parent, while flipping any of the leading ones can result in a drastic fitness loss (Figure F1).

**Theorem 13:** **Expected optimization time for** . *The expected optimization time (**Figure F2**)* *of WM with local mutations*, *and *\ *on* *is**Proof*. For this problem, *x* will denote the number of leading ones in the bit string. We lower bound the forward drift by the probability of mutation choosing the first nonzero bit and its acceptance probability of being flipped For the backward drift, notice that flipping the *j*-th leading one will imply a fitness decrease of but, as we will show, the exponential decrease of for deleterious mutations will overcome this effect, yielding a total positive drift toward the optimum:Since we can call Lemma 3 to simplify the backward drift by using yieldingIntroducing we obtainNow we compute the total drift,calling Lemma 2 yieldsusing and we can lower bound by 2 obtainingFinally, we apply the variable drift theorem to the number of bits after the *x* leading ones Using the bounds on (B1) one gets

## Footnotes

*Communicating editor: J. Hermisson*

- Received March 17, 2016.
- Accepted November 16, 2016.

- Copyright © 2017 by the Genetics Society of America

Available freely online through the author-supported open access option.