## Abstract

In relating genotypes to fitness, models of adaptation need to both be computationally tractable and qualitatively match observed data. One reason that tractability is not a trivial problem comes from a combinatoric problem whereby no matter in what order a set of mutations occurs, it must yield the same fitness. We refer to this as the *bookkeeping problem*. Because of their commutative property, the simple additive and multiplicative models naturally solve the bookkeeping problem. However, the fitness trajectories and epistatic patterns they predict are inconsistent with the patterns commonly observed in experimental evolution. This motivates us to propose a new and equally simple model that we call *stickbreaking*. Under the stickbreaking model, the intrinsic fitness effects of mutations scale by the distance of the current background to a hypothesized boundary. We use simulations and theoretical analyses to explore the basic properties of the stickbreaking model such as fitness trajectories, the distribution of fitness achieved, and epistasis. Stickbreaking is compared to the additive and multiplicative models. We conclude that the stickbreaking model is qualitatively consistent with several commonly observed patterns of adaptive evolution.

ADAPTIVE evolution is challenging to understand because it depends on a rich array of biological properties. Among those receiving recent theoretical and experimental attention are the magnitude and distribution of mutational fitness effects, the length of adaptive walks, the rate of fitness increase, and the population dynamics that drive it (*e.g.*, Gerrish and Lenski 1998; de Visser *et al.* 1999; Orr 2002, 2003; Rozen *et al.* 2002; Cowperthwaite *et al.* 2005; Barrett *et al.* 2006; Desai *et al.* 2007; Eyre-Walker and Keightley 2007; Joyce *et al.* 2008; Rokyta *et al.* 2008; Barrick *et al.* 2009; Betancourt 2009; Burch and Chao 1999; Kryazhimskiy *et al.* 2009; Schoustra *et al.* 2009). Equally important are epistasis, pleiotropy, parallelism, mutation order, and the number of beneficial mutations available (*e.g.*, Wichman *et al.* 1999, 2005; Holder and Bull 2001; Kim and Orr 2005; Weinreich *et al.* 2006; Silander *et al.* 2007; Rokyta *et al.* 2009, 2011; Chou *et al.* 2011; Khan *et al.* 2011; Kvitek and Sherlock 2011; Miller *et al.* 2011). Note that the latter features of adaptation are more meaningful when the identities of the mutations are known and when we consider adaptation as a process subject to replication. For example, epistasis occurs when specific mutations have different effects on different genetic backgrounds (Bonhoeffer *et al.* 2004; Sanjuán *et al.* 2004). The rise of genomic sequencing technologies is having a dramatic effect on the ability of researchers to know the identity of mutations occurring during adaptation.

Knowing the identities of adaptive mutations expands the types of questions that can be addressed, but also creates new challenges. All models of adaptation must assign fitness values to genotypes that have arisen through mutation. In connecting genotype and fitness, a model must have the following property: if the wild-type background acquires mutations *A*_{1}, *A*_{2}, and *A*_{3} to yield a genotype with fitness *w*_{1,2,3}, every possible order of these mutations must also result in fitness *w*_{1,2,3}. As the number of fixed mutations grows, the number of possible pathways grows in a factorial manner. We call this consistency requirement the *bookkeeping problem*.

At least two groups of population genetic models address the bookkeeping problem: one maps genotype change (*i.e.*, mutation) directly onto fitness (GF models), and the second maps genotype onto phenotype and then phenotype onto fitness (GPF models). Here we focus on the simpler GF models. Among these, the *additive model* assumes mutations have an additive effect on fitness. To be more precise, the fitness after mutations *A*_{1} and *A*_{2} occur on the wild-type background is *w*_{1,2} = *w*_{wt} + Δ*w*_{1} + Δ*w*_{2}, where Δ*w*_{1} and Δ*w*_{2} are the intrinsic effects expressed as fitness differences of mutations *A*_{1} and *A*_{2}. The bookkeeping problem is solved by the commutative property of addition (*i.e.*, Δ*w*_{1} + Δ*w*_{2} = Δ*w*_{2} + Δ*w*_{1}). Under the *multiplicative model*, the intrinsic effects are selection coefficients affecting fitness in a multiplicative fashion: *w*_{1,2} = *w*_{wt}(1 + *s*_{1})(1 + *s*_{2}), where *s*_{1} and *s*_{2} are the intrinsic effects of mutations *A*_{1} and *A*_{2}. Multiplication also has the same commutative property [*i.e.*, (1 + *s*_{1})(1 + *s*_{2}) = (1 + *s*_{2})(1 + *s*_{1})] and thus solves the bookkeeping problem. Both of these solutions to the bookkeeping problem are simple to simulate and test on real data.

Another solution, implicit in the *uncorrelated landscape model* (Gillespie 1991; Orr 2002; Joyce *et al.* 2008), is to assume that the set of mutations arising in an adaptive walk can arise in only one order because each mutation is beneficial on exactly one background. This occurs because the probability of a mutation being beneficial on more than one highly fit background is small enough to be ignored. Thus under the uncorrelated model, once replicate adaptive walks depart from each other, they are 100% divergent. Since the bookkeeping problem involves convergence, the bookkeeping problem is avoided. However, the uncorrelated model makes the extreme prediction for real data that no mutation will be beneficial on two different backgrounds.

Another set of models that avoids the bookkeeping problem is those that assume the number of beneficial mutations on any background is effectively infinite. Under this assumption, the probability of convergent evolution is zero and the bookkeeping problem does not arise. Examples of models that make this assumption include Gerrish and Lenski (1998), Rozen *et al.* (2002), Desai *et al.* (2007), and Kryazhimskiy *et al.* (2009).

The *NK* model (Kauffman 1993) is unusual among GF models in that it can produce landscapes with intermediate levels of epistasis. In the *NK* model, *N* is the number of sites and *K* is the number of other sites each site interacts with. When *K* = 0, it is the additive model and when *K* = *N* − 1 it is equivalent to the uncorrelated model. When 0 < *K* < *N* − 1, the interaction terms mean that the mutational effects are no longer background independent. The interactions bring more biological realism and allow richer patterns of epistasis, but at the expense of model simplicity. Simulating data when 0 < *K* < *N* − 1, while ensuring the bookkeeping criteria are met, is computationally challenging because it requires assigning fitnesses to the entire fitness landscape. The interactions also pose a problem for analyzing real data because they introduce a large number of parameters that must be estimated.

Kryazhimskiy *et al.* (2009) have also developed a flexible GF modeling framework where the uncorrelated and additive models arise as special cases. These models allow different types of epistasis and decelerating fitness trajectories to be produced. However, because the fitness of beneficial mutations in such models depends only on the current fitness, they do not solve the bookkeeping problem.

Thus there is an array of GF models. Among those that offer simple solutions to the bookkeeping problem (additive, multiplicative, and uncorrelated), they generally fail to predict several commonly observed properties of real adaptation. Specifically, in laboratory adaptations parallel evolution is not uncommon, most fitness gain occurs early in a walk, and epistasis is common (Lenski and Travisano 1994; Bull *et al.* 1997; Elena and Lenski 1997; Wichman *et al.* 1999, 2005; Cooper and Lenski 2000; Burch *et al.* 2003; Sanjuán *et al.* 2004; Cowperthwaite *et al.* 2005; Woods *et al.* 2006; Barrick *et al.* 2009; Betancourt 2009; Rokyta *et al.* 2009, 2011; Chou *et al.* 2011; Khan *et al.* 2011).

This leads us to propose a novel GF model for combining mutational effects that we call stickbreaking. The *stickbreaking model* is premised on the familiar idea that mutations have intrinsic effects. But rather than assuming fitness differences are background independent (like the additive model) or that differences scale by background fitness (like the multiplicative model), differences in the stickbreaking model scale by how near the current background is to a hypothesized upper fitness boundary. For example, if mutation *A*_{1} has stickbreaking coefficient *u*_{1} and the fitness distance from the wild type to the boundary is *d*, then the mutation will increase fitness by the amount *du*_{1} (Figure 1). We use theory and simulations to show that stickbreaking both solves the bookkeeping problem and produces some qualitative features commonly observed in adaptive evolution.

## Models

### Stickbreaking

We begin by introducing the stickbreaking model and compare it to the additive and multiplicative GF models. Suppose the maximum fitness achievable in the current environment is *w*_{max} while the current fitness is *w*_{wt.} Let *d* = *w*_{max} − *w*_{wt} be the maximum possible fitness gain through adaptation. Let *u _{i}* be the stickbreaking coefficient of

*A*such that its fitness on the wild-type background,

_{i}*w*, is given by

_{i}*w*=

_{i}*w*

_{wt}+

*du*, where

_{i}*u*≤ 1. In the stickbreaking model, stickbreaking coefficients are assumed to be background independent. If a second mutation,

_{i}*A*, with stickbreaking coefficient

_{j}*u*occurs on the

_{j}*A*background, its fitness is given by,

_{i}*w*

_{i}_{,}

*=*

_{j}*w*

_{wt}+

*d*(

*u*

_{1}+

*u*

_{2}(1 −

*u*

_{1})). To see why, note that after the first mutation fixes, the remaining distance to the boundary is

*d*(1 −

*u*) and the second mutation therefore increases the fitness by

_{i}*u*(1 −

_{j}d*u*). Adding this increase to the fitness of the first mutation,

_{i}*w*

_{wt}+

*du*+

_{i}*u*(1 −

_{j}d*u*), and simplifying gives

_{i}*w*

_{wt}+

*d*(

*u*

_{1}+

*u*

_{2}(1 −

*u*

_{1})). But since

*u*

_{1}+

*u*

_{2}(1 −

*u*

_{1}) = 1 − (1 −

*u*

_{1})(1 −

*u*

_{1}), we can rewrite the fitness of the double mutant as

*w*

_{i}_{,}

*=*

_{j}*w*

_{wt}+

*d*(1 − (1 −

*u*)(1 −

_{i}*u*)). In general, if

_{j}*m*mutations with identities

*A*

_{1},

*A*

_{2}, … ,

*A*and stickbreaking coefficients

_{m}*u*

_{1},

*u*

_{2}, … ,

*u*accumulate on the wild-type background, the fitness is given by

_{m}*A*thus closes the distance between the current background and the fitness limit by a proportion

_{i}*u*. This process is analogous to a stickbreaking exercise. With a stick of length

_{i}*d*laid along a number line, the first mutation dictates where, in a fractional sense, it is broken. Setting the left portion of the stick aside, the next mutation determines where the remaining right portion is broken. The process continues with subsequent mutations breaking the remaining right portion into ever smaller pieces. Unless a stickbreaking coefficient of 1 is available, fitness will never actually reach the fitness maximum.

The stickbreaking model solves the bookkeeping problem because, as Equation 1 shows, the final fitness depends on the *product* of intrinsic effects and is therefore order independent. Note that mutations with intrinsic effects between 0 and 1 are beneficial. It is less obvious that intrinsic effects may be zero or negative, representing neutral and deleterious mutations, respectively. We also note that the stickbreaking metaphor appears in other modeling contexts, for example, to describe niche partitioning and species abundance in ecology (MacArthur 1957; Patil and Taillie 1977) and in population genetics to derive the distribution of age-ordered alleles under the infinite-alleles model (Donnelly and Joyce 1989). To our knowledge, stickbreaking has not previously been applied to the subject of adaptive evolution.

### Stickbreaking compared to additive and multiplicative models

Because of the mathematical similarities between the stickbreaking, additive, and multiplicative models, it is possible to assess when they yield similar results and when they do not. Fitness effects are expressed as fitness differences (Δ*w*) in the additive model, selection coefficients (*s*) in the multiplicative model, and stickbreaking coefficients (*u*) in the stickbreaking model. In each case, the model’s respective fitness effects are assumed to be background independent. More precisely, if *b* is the genetic background and *i* is the arising mutation, then Δ*w _{i}*

_{|}

*=*

_{b}*w*

_{i}_{,}

*−*

_{b}*w*,

_{b}*s*

_{i}_{|}

*= (*

_{b}*w*

_{i}_{,}

*−*

_{b}*w*)/

_{b}*w*, and

_{b}*u*

_{i}_{|}

*= (*

_{b}*w*

_{i}_{,}

*−*

_{b}*w*)/(

_{b}*w*

_{max}−

*w*).

_{b}Under the additive model, the fitness after *A*_{1}, *A*_{2}, … , *A _{m}* mutations with fitness differences Δ

*w*

_{1}, Δ

*w*

_{2}, … , Δ

*w*have accumulated on the wild-type background is

_{m}*m*mutations with selection coefficients

*s*

_{1},

*s*

_{2}, … ,

*s*have accumulated is given by

_{m}The stickbreaking, additive, and multiplicative models converge to the same model when effect sizes are small and walks are not too long. This occurs when the product of effect size and walk length is small. Note that if the product in Equation 3 is expanded and all higher-order terms are assumed to be zero, then fitness under the multiplicative model is approximated by a sum,*w*_{wt}*s _{i}* ≈

*du*≈ Δ

_{i}*w*.

_{i}### Definitions of fitness

Before continuing, it is important to clarify our approach to defining fitness. We have denoted and continue to denote fitness in a generic sense as *w*. Fitness is more precisely defined in two ways that we call *Darwinian* and *Malthusian* fitness. *Darwinian fitness* is λ in a discrete population growth model, *N _{t}* =

*N*

_{0}λ

*, where*

^{t}*N*

_{0}and

*N*are the population sizes at time 0 and time

_{t}*t*.

*Malthusian fitness*is

*r*in the continuous growth model,

*N*=

_{t}*N*

_{0}

*e*. One can be easily transformed to the other by λ =

^{rt}*e*or ln(λ) =

^{r}*r*. They can also be defined in relative terms where the change in frequency of a mutant to a reference type gives the ratio of growth rates (Hartl and Clark 1997); their meaning and log relationship are the same.

In this article, the definition of fitness is important when we consider (i) how fitnesses arise during an adaptive walk and (ii) what type of fitness is measured when a walk is “observed”. In modeling walks (i), we maintain generality by considering mutations acting in an additive, multiplicative, or stickbreaking manner on either Darwinian or Malthusian fitness. This yields six combinations. Note, because multiplicative effects on λ and additive effects on *r* are equivalent, there are actually five different models. For clarity, however, we describe them as a set of six models. After an adaptive walk occurs, we imagine measuring fitness (ii). Throughout this article we measure Malthusian, but not Darwinian, fitness to simplify our results and because Malthusian fitness is the predominant definition used in the experimental evolution literature.

### Fitness trajectories

The predicted fitness under the additive, multiplicative, and stickbreaking models after *m* steps can be approximated if we assume the pool of beneficial mutations (*M*) is large enough that sampling is effectively done with replacement (*i.e.*, *M* ≫ *m*). Then, under strong selection, weak mutation (SSWM) conditions, the expected effect of a mutation that arises, escapes drift, and sweeps to fixation is given by *x _{i}* represents the intrinsic effect under either of the three models (

*i.e.*, Δ

*w*,

_{i}*s*, or

_{i}*u*). We therefore replace Δ

_{i}*w*,

_{i}*s*, and

_{i}*u*in Equations 1, 2, and 3 with ν. Note that when mutations affect λ, but we measure

_{i}*r*, a log transformation is necessary. These approximations as well as model abbreviations are given in Table 1.

### Distributions of fitness during replicate walks

We want to know the distribution of fitness achieved at step *m* when the total number of beneficial mutations available is *M* under each of the three models. Note that this differs from the familiar distribution of fitness effects and the distribution of fitnesses across the landscape; rather, it is the distribution of fitness achieved among replicate walks after *m* steps when all walks begin at the same genotype. The details of this derivation are provided in the *Appendix*. Denote the intrinsic effect of mutation *i* as *x _{i}*, where

*x*= Δ

_{i}*w*,

_{i}*x*=

_{i}*s*, and

_{i}*x*=

_{i}*u*under the three models. Assume the

_{i}*x*values are drawn from a distribution and replicate walks occur using this fixed set of mutations (

_{i}*i.e.*, on a fixed landscape). Let

*Y*be the intrinsic effect of the mutation that fixes at step

_{j}*j*. Note that

*s*and

_{i}*u*differ from Δ

_{i}*w*by a scaling factor that cancels when calculating the scale-free quantity

_{i}*Y*. If

_{j}*M*is large and

*m*is an order of magnitude smaller, such that as both

*M*→ ∞ and

*m*→ ∞,

*m*ln(

*M*)/

*M*→ 0, then

*Y*

_{1},

*Y*

_{2}, … ,

*Y*will be approximately independent and identically distributed with

_{m}*j*= 1, 2, … ,

*m*. On the basis of the central limit theorem, this implies that the distribution of

*M*is large and

*m*is small, but not extremely small (

*i.e.*, when the pool of beneficial mutations is large and the number to have fixed is moderately small), fitness of replicate walks under the additive, multiplicative, and stickbreaking models follows the normal, log-normal, and negative log-normal distributions with density functions and parameter values provided in the

*Appendix*. These limiting distributions can be obtained as a function of time, not mutational step, using a scale transformation.

### Epistasis

Epistasis occurs when a mutation has different fitness effects in different genetic backgrounds. One way to measure epistasis is therefore to assess the fitness effect of a mutation across different backgrounds. A second way to examine epistasis is as a deviation of observation from prediction: (i) measure the fitness effects of two or more mutations on the same genetic background, (ii) predict their combined fitness effect under an assumed model on the basis of their individual effects, (iii) measure their combined fitness effect, and (iv) define epistasis as the disparity between predicted (ii) and observed (iii). The first approach is more intuitive, and the latter is more commonly used in the literature as a measure of epistasis. We pursue both here.

#### Epistasis as different effects of the same mutation across backgrounds:

For any mutation, we specifically wish to know how its fitness effects change across the steps of a walk beginning with the wild type and continuing until the mutation actually fixed. Following convention, we define fitness effects as differences in *r*. As above, we consider data arising under each of six models. We assume the pool of beneficial mutations is large and SSWM conditions operate such that the expected fitness effect of a mutation at each step is given by ν. An adaptive walk of length *m* − 1 occurs. If we imagine a mutation of average (fixed) effect, ν, is then inserted (*i.e.*, genetically engineered) as the *m*th mutation on the *m* − 1 background, the expected value of Δ*r* that results is contained in Table 2.

#### Epistasis as departure of observed from predicted effects of combined mutations:

An alternative way to measure epistasis is as a departure of observation from prediction: ε = *r*_{obs} − *r*_{pred}. Predicted values are based on additivity on *r* while observed data arise according to one of the six models. We are interested in how the disparity between observed and predicted fitness depends on the model under which fitness effects arise and the number of mutations considered, *m*. Again, we assume SSWM conditions and a large pool of beneficial mutations such that the expected effect of a randomly fixing mutation is ν. Table 3 gives the expected values for ε for each of the six models.

## Simulations

### Overview

Simulations written in R (R Development Core Team 2009) were used to study the patterns of fitness trajectory, distribution of fitness effects, and epistasis and to compare these to the theoretical results derived above. All simulations were done in the following basic framework. First, we assumed SSWM dynamics (Gillespie 1991) such that the population is described by a procession of fixed beneficial mutations. Second, a fitness landscape was defined by a relatively small number of beneficial mutations (*M* = 50) with fitness effects, *x*, randomly drawn from a distribution. Neither the pool of mutations nor their inherent effects change as adaptive walks proceed. Third, the time until the next mutation fixed was simulated by drawing random exponential waiting times for all *M* − *m* available mutations with rate *N*μ_{b}π(*s _{i}*), where

*N*was set at 10

^{5}, the per site per generation beneficial mutation rate, μ

_{b,}was set to 2 × 10

^{−7}, and the fixation probability for mutation

*A*, π(

_{i}*s*), is given by

_{i}*s*is the selection coefficient of

_{i}*A*as traditionally defined [

_{i}*i.e.*, fractional changes in λ or differences in

*r*(Chevin 2010)]. The mutation that fixed was that with the shortest waiting time.

In conducting simulations, we had to decide whether to conduct replicate walks on one landscape or single walks on replicate landscapes. In other words, should we average over replicate walks or replicate landscapes? We argue that conducting replicate walks on the same landscape is more analogous to experimental evolution where these models may ultimately be tested empirically. Consequently, we simulated a single landscape and ran 1000 replicate walks on this landscape, collecting and summarizing relevant information. We then repeated this entire process over several landscapes and confirmed that the observed qualitative patterns that are our focus here do not depend on the particular landscape (results not shown). To generate a landscape, 50 beneficial mutations were drawn from the positive region of a negative log normal (*Appendix*). If *X* ∼ Normal (μ, σ), then 1 − *e ^{X}* is a sample from the negative log normal. Parameters for the negative log normal (μ = 0.75, σ = 0.6) were chosen so that 10% of the probability is positive (Figure 2). This distribution was used because it produces values ≤1 as required by the stickbreaking model while also being consistent with the additive and multiplicative models. Once it was generated, we used this single set of 50 values to simulate replicate walks under the six models: additive, multiplicative, and stickbreaking affecting Darwinian or Malthusian fitness. For all models the initial fitness was set at 1, and for both stickbreaking models, the fitness boundary was set at 2 such that

*d*= 1. Walks were simulated until all 50 beneficial mutations fixed.

### Analysis of simulations

Three analyses of simulated data were conducted. First, we compared the mean fitness trajectory for each of the six models. Because final fitness differs dramatically between models, trajectories were rescaled for every simulated walk to range from zero to one. Second, to assess the distribution of fitness, we sampled fitness for each of the 1000 walks at steps 5, 10, 20, and 30 and generated histograms from the results. Third, epistasis was measured in the same two ways we quantify it in the *Epistasis* section above: (i) as fitness effects and (ii) as a departure from additivity on *r*. In approach i, we took a mutation that arose later in a walk, simulated engineering it into each of the preceding backgrounds, and measured its resulting fitness effect. We arbitrarily used the mutation fixing 10th and we defined *fitness effect* as the difference in *r*. In the latter approach (ii), we compared observed fitness with predicted fitness on the *r* scale. For each simulated walk, we considered the first *m* mutations that fixed for *m* = 2, 3, … , 10. We then imagined measuring the effect of each of these *m* mutations on the wild-type background (*i.e.*, as first-step mutations) yielding Δ*r*_{1|wt}, Δ*r*_{2|wt}, … , Δ*r _{m}*

_{|wt}. Under the additive model, the predicted fitness when all

*m*mutations are combined is just

*r*

_{1,2,…,}

_{m}_{(pred)}=

*r*

_{wt}+ Δ

*r*

_{1|wt}+ Δ

*r*

_{2|wt}+ … Δ

*r*

_{m}_{|wt}. Epistasis, as a function of the number of mutations, is then ε

*=*

_{m}*r*

_{1,2,…,}

_{m}_{(obs)}−

*r*

_{1,2,…,}

_{m}_{(pred)}.

## Results and Discussion

Our objective in this work is to propose and explore a new model of combining mutational effects, which we call stickbreaking. Stickbreaking is premised on the idea that, in the current environment and on short evolutionary timescales, there is a fitness boundary imposed by the laws of biochemistry and by restrictions on how radically the architecture of the genome can be altered by mutation. This limits how dramatically phenotype can be changed over a short evolutionary time span. Within the scope of available phenotypes, the optimal one corresponds to the fitness boundary. For example, if a set of mutations affects the rate a virus attaches to its host, the accumulation of many such mutations will not indefinitely push the attachment rate higher; rather, a boundary on attachment and therefore fitness will be imposed by the kinetics of collisions of objects in random motion. Such boundaries help provide a basic rationale behind the stickbreaking model.

Stickbreaking may also arise when organisms are moderately redundant such that they may solve a given problem multiple ways. Once substantial progress is made toward one solution (through mutation), pursuing alternative solutions to the same problem may be beneficial, but not nearly as much as the first. In the attachment example above, we might imagine multiple residues where binding can occur to the host; a virus that attaches poorly requires a mutation at only one of these residues to dramatically increase attachment. Subsequent mutations offering alternative ways to bind will provide diminishing beneficial effects. Conversely, when an organism is very near the optimal fitness because it has found several, semiredundant solutions to a problem, a deleterious mutation that disrupts one solution will have a relatively small negative effect on fitness. It is also noteworthy that patterns qualitatively similar to stickbreaking can emerge from metabolic control theory (Kacser and Burns 1981). When a mutation changes the activity of an enzyme in a pathway, its effect on the pathway’s flux is smaller than on the enzyme itself and it diminishes the nearer the pathway is to the maximum flux.

In stickbreaking, these biological assumptions of a boundary and diminishing effects are translated mathematically by allowing mutations to further and further subdivide the distance to the boundary in a multiplicative manner (Equation 1, Figure 1). Because it involves a product, stickbreaking has the commutative property and, like the additive and multiplicative models, thereby solves the bookkeeping problem. However, this process of subdivision leads to different walk properties from those models.

### Fitness trajectory

Different models lead to dramatically different trajectories of fitness as a function of mutational step over an adaptive walk (Figure 3A). When mutations affect *r*, the trajectories for the additive, multiplicative, and stickbreaking models are approximately linear, exponential, and rapidly decelerating, respectively. When mutations instead affect λ, the trajectories are shifted: additive becomes modestly decelerating, multiplicative becomes approximately linear, and deceleration under stickbreaking becomes very slightly exaggerated. Note that the theoretical expectations from Table 1 (Figure 3A, shaded lines) are qualitatively correct; the disparities between them and the simulations (Figure 3A, solid lines) reflect the limited pool of beneficial mutations and the biased nature in which selection fixes mutations.

A survey of the experimental evolution literature indicates that, in most cases, the observed fitness trajectory decelerates as adaptation proceeds. This result has been observed in *Escherichia coli* (Lenski and Travisano 1994; de Visser *et al.* 1999; Barrick *et al.* 2009), the DNA bacteriophages ϕX174 and G4 (Bull *et al.* 1997; Wichman *et al.* 1999; Holder and Bull 2001), RNA bacteriophage (Burch and Chao 1999; Betancourt 2009), and the animal RNA virus, vesicular stomatitis virus (VSV) (Elena *et al.* 1998). The exceptions we are aware of are approximately linear trajectories in *Saccharomyces cerevisiae* (Desai *et al.* 2007) and in one study on VSV (Novella *et al.* 1995). Of the models considered here, both stickbreaking models show rapidly decelerating trajectories and the additive model on λ shows a moderately decelerating trajectory.

This suggests that one of these three models is likely nearer the truth than the model most commonly assumed in the literature, additivity on *r* (multiplicative on λ) with its approximately linear trajectory. There are at least two reasons to be somewhat cautious regarding this conclusion. First, our results are based on SSWM dynamics while many experimental and real world systems involve interference dynamics with more than one mutation contending simultaneously. Under interference dynamics, selection is more efficient at fixing bigger-effect mutations early in a walk compared to SSWM conditions (Rozen *et al.* 2002; Barrett *et al.* 2006). We can obtain a bound on this effect by assuming the pool of contending mutations is the entire pool of beneficial mutations and selection therefore fixes them in descending order from the largest to the smallest. Figure 3B shows this trajectory. As expected, interference shifts all the trajectories toward a decelerating pattern although the effect is modest.

Second, trajectories are affected by whether fitness is considered a function of mutational step (as we have done thus far) or time. Plotting fitness against time instead of step bends most of the trajectories toward a more concave, decelerating shape (Figure 3C). Under all models, there is a tendency to fix mutations from larger to smaller intrinsic effect. When all else is equal, this leads to selection coefficients (as traditionally defined, see *Simulations*) tending from large to small and, therefore, to waiting times between fixation events tending from short to long. In the “add on *r*” model, this is the only effect, and the trajectory decelerates moderately. In the “add on λ” model there is also the effect that as fitness grows in an additive way, the proportional effect of each added mutation (the selection coefficient) becomes smaller. The stickbreaking models are most dramatically affected by the timescale because as they approach their boundary, selection coefficients become very small and waiting times very long. At the other extreme lies the “mult on *r*” model where selection coefficients actually get larger as the walk proceeds, causing the walk to accelerate in time for most of its duration. We leave a statistical treatment of trajectory data for later work and here emphasize three things: (i) most of the models show decelerating trajectories, (ii) the slowdown is exaggerated both by clonal interference and by using time rather than step as the explanatory variable, and (iii) with or without these influences, the stickbreaking models show much more dramatic decelerating effects than the other models.

### Distribution of fitness over replicate walks

When mutations affect Malthusian fitness, *r*, and fitness is measured as *r*, the theoretical distributions from replicate walks (*Appendix*) are log normal, normal, and negative log normal for the additive, multiplicative, and stickbreaking models (solid lines in Figure 4, A–C). When mutations affect λ instead, these qualitative patterns are only slightly changed with heavier left tails (Figure 4, D and E). These predictions are based on asymptotic assumptions that (i) the total number of beneficial mutations, *M*, is large, (ii) the step where fitness is measured is far smaller than the number of beneficial mutations, *m* ≪ *M*, and (iii) *m* is large enough for the law of large numbers to apply. In reality, *M* will often be modest (*e.g.*, 10 < *M* < 100) and *m* may be relatively small (*e.g.*, ≤30). The simulations shed light on what effect violating these assumptions has.

Early in a walk (*m* ≤ 10) there is good agreement between the observed and predicted distributions (Figure 4) in terms of both mean and variance. As a walk approaches its midpoint, observed means are notably smaller than the predicted means because the theory assumes constant effect sizes while, in simulated walks, fitness increase slows as large-effect mutations are removed from the available pool. Still, the shapes of the distributions remain the same even when *m* is large. The different models make qualitatively different predictions about the distribution of fitness during replicate adaptive walks: both stickbreaking models predict heavy left tails, the multiplicative on *r* model a heavy right tail, and both additive models an approximately normal distribution. Whether mutations affect *r* and λ is relatively minor. Note also that the distributions are in terms of number of mutations fixed (steps), not time elapsed. As shown in the *Fitness* *trajectory* subsection above, different models fix mutations in different lengths of time (Figure 3C) and will therefore achieve the distributions shown in Figure 4 at different rates (see the *Appendix* for details).

### Epistasis

Epistasis occurs when the fitness effect of a mutation depends on the genetic background. We investigate epistasis in two ways: first as the effect of a single mutation across a procession of backgrounds and second as departures from additivity when a set of single mutations is combined. For the first approach, we simulate replicate walks of 10 mutational steps under each model on a single landscape. We then imagine taking the mutation that fixed 10th and engineering it into each of the preceding backgrounds in the walk. (Our choice of the 10th mutation is arbitrary, but using other stop points does not change the qualitative patterns observed; data not shown).

The solid lines in Figure 5 show the means of simulation results when fitness effects are defined as differences in *r* while the shaded lines give the theoretical relationships (Table 2). The results show how the observed fitness effects change along the walk under the different models for the same mutation (or as the intrinsic effect is held constant). Effect sizes grow exponentially for the mult on *r* model, are constant for the add on *r* (mult on λ) model, decay moderately for add on λ, and show rapidly diminishing effects for both stickbreaking models. Of course, these patterns closely reflect the previously discussed fitness trajectories. Here we are considering how the vertical distance (fitness) between steps qualitatively changes along a walk when the intrinsic effect is held constant. It is also noteworthy that because differences in *r* are, in fact, selection coefficients, Figure 5 illustrates how selection coefficients change across a walk under each model. As discussed above, this, in turn, explains how waiting times between mutations change across a walk (Figure 3C).

In the literature, epistasis is more commonly quantified as the departure from additivity when single mutations are combined. We again simulated replicate walks under each model on a single landscape. For the first *m* mutations that fixed, we imagined engineering each into the wild type and measuring their fitness effects (as difference in *r*). In keeping with the literature, we predicted fitness on the basis of the additivity of the *r* model (*i.e.*, summing fitness effects). Epistasis is then defined as ε = *r*_{obs} − *r*_{pred}. For beneficial mutations ε < 0 and ε > 0 are termed antagonistic and synergistic epistasis, respectively.

The patterns of ε (Figure 6) are similar to those observed in Figure 5. The stickbreaking models show strong antagonistic epistasis, add on λ shows moderate antagonistic epistasis, add on λ (mult on *r*) shows no epistasis (by definition), and mult on *r* shows strong synergistic epistasis. In fact, it is easy to understand why ε (Figure 6) and fitness effect (Figure 5) must follow the same basic pattern. Consider two mutations, *A*_{1} and *A*_{2}. If ε < 0 (antagonistic epistasis), then *r*_{obs} < *r*_{pred}. Letting Δ*r* denote fitness effect on *r* and *r*_{wt} denote the wild-type fitness, this implies that *r*_{wt} + Δ*r*_{1|wt} + Δ*r*_{2|1} < *r*_{wt} + Δ*r*_{1|wt} + Δ*r*_{2|wt}, which implies Δ*r*_{2|1} < Δ*r*_{2|wt}, or a diminishing effect. Similar arguments can be made for ε = 0 and ε > 0.

In the experimental evolution literature, the commonly observed patterns of epistasis are (1) diminishing effects, where the same mutation has smaller effects on more fit backgrounds and conversely larger effects on less fit ones, and (2) antagonistic epistasis is more frequent than synergistic epistasis (Burch *et al.* 2003; Sanjuán and Elena 2006). For example, Bull *et al.* (2000) found that the fitness effect of one mutation (1727T) in the bacteriophage ϕX174 decreased across four backgrounds of increasing fitness. Recently, Chou *et al.* (2011), Khan *et al.* (2011), and Kvitek and Sherlock (2011) all showed a general pattern of diminishing returns epistasis when beneficial mutations were inserted into closely related backgrounds. Similar results are found in double-mutant studies. Trindade *et al.* (2009) found that when antibiotic resistance mutations in *E. coli* are combined, 42% of those showing significant epistasis are antagonistic, while only 15% show synergistic epistasis. Rokyta *et al.* (2011) inserted nine beneficial single mutations in a G4-like bacteriophage to form 18 double mutants and found antagonistic epistasis for all 18. Finally, a synthesis of 21 studies by Sanjuán and Elena (2006) indicated that antagonistic epistasis is more prevalent in viruses and prokaryotes, while synergistic or no epistasis is more common in eukaryotes. Thus studies have tended to show patterns of epistasis broadly consistent with the two stickbreaking models and additivity on λ.

It is important to clarify that the values of ε and hence the patterns of antagonistic *vs.* synergistic epistasis depend on the null model used to calculate predicted fitness. It is easy to see what the patterns would be under other nulls by noting that the “predicted” and observed labels in Figure 6 are arbitrary. Figure 6 can also be thought of as showing the fitness divergence between different models as mutations of the same intrinsic effect are introduced. For any null and alternative model, the distance between them corresponds to the values of ε.

### Conclusion

The stickbreaking model is based on the simple idea that mutational fitness effects should diminish the nearer the background is to the maximum fitness boundary. It solves the bookkeeping problem while also producing patterns of fitness trajectory and epistasis broadly consistent with experimental findings. The next important step is to develop statistical methods for fitting and testing the stickbreaking model on real data. Like the additive and multiplicative models, stickbreaking is too simple to be biologically correct. Rather, our hope is that stickbreaking is mathematically tractable like those models, but also captures a basic biological property and provides an explanatory power that those models seem to miss.

## Appendix

### Distribution of Total Fitness Effects After *m* Steps of Adaptation

We show here that there are three limiting distributions for the fitness achieved after *m* steps in a walk: the normal distribution under additivity, the log normal under the multiplicative model, and negative log normal under stickbreaking.

Denote the “intrinsic” fitness effect of the beneficial mutation *A _{i}* by

*x*. For the additive model

_{i}*x*= Δ

_{i}*w*, for the multiplicative model

_{i}*x*=

_{i}*s*, and for the stickbreaking model

_{i}*x*=

_{i}*u*. Note that

_{i}*u*and

_{i}*s*are just different ways to scale Δ

_{i}*w*. That is,

_{i}*u*= Δ

_{i}*w*/(

_{i}*d*−

*w*

_{wt}) and

*s*= Δ

_{i}*w*/

_{i}*w*

_{wt}. Therefore

*M*,

*x*

_{1},

*x*

_{2}, … ,

*x*is fixed. That is, we use the same set of intrinsic fitness effects for replicate walks.

_{M}Consider an adaptive walk of length *m*. Let *Y _{i}* be the intrinsic fitness effect of the mutation arising at step

*i*. The joint distribution of

*Y*

_{1},

*Y*

_{2}, … ,

*Y*can be described as

_{m}*Y*

_{1},

*Y*

_{2}, … ,

*Y*are dependent random variables. The dependence comes from the fact that once a mutation is used in a walk it will not be used again, thus reducing the number of available mutations at each step. However, if

_{m}*M*is large enough, we show below that

*Y*

_{1},

*Y*

_{2}, … ,

*Y*are approximately independent and identically distributed. Let

_{m}*x*

_{(1)}= max{

*x*

_{1},

*x*

_{2}, … ,

*x*}. Note that

_{M}*M*and

*m*becomes important. We assume that

*m*is an order of magnitude smaller than

*M*. More precisely, we assume that as

*M*→ ∞, then

*m*ln(

*M*)/

*M*→ 0 and

*m*→ ∞. It follows from extreme value theory that for large

*M*,

*x*

_{(1)}≈

*c*ln

*M*. [More precisely

*x*

_{(1)}/ln(

*M*) converges to a constant

*c*as

*M*→ ∞.] Taking the limit as

*M*→ ∞ in inequality (A2) reveals that

*M*,

*k*= 1, 2, … ,

*m*. If we replace the denominators in Equation A1 with

*Y*

_{1},

*Y*

_{2}, … ,

*Y*are approximately independent and identically distributed with

_{m}*M*→ ∞.

### Normal, Log Normal, and Negative Log Normal

Below we review the three central distributions associated with *m* steps of an adaptive walk. The *normal distribution* is given by*X* follows the normal distribution, we say that *V* = *e ^{X}* follows the

*log-normal*

*distribution*with probability density function given by

*V*follows the log normal, we say that

*W*= 1 −

*V*follows the

*negative log-normal*

*distribution*with probability density function given by

^{2}represents the variance of a normal, the mean of the log-normal distribution is

*W*is negative log normal, then the mean is

Now if *Y _{i}* represents the fitness differences, then the fitness after

*m*steps is given by

*w*

_{1,2,…,}

*will be approximately normal with mean*

_{m}*w*

_{1,2,…,}

*/*

_{m}*w*

_{wt}is distributed log normal.

Under the stickbreaking model, *mE*(1 − *Y*) and σ^{2} = *m*Var(1 − *Y*) and the formulas are analogous to those of the log normal.

The assumption that *M* is large enough so that *m* is an order of magnitude smaller yet *m* is still large enough for the central limit theorem to apply is not always going to be achieved. Simulations can help in determining the degree to which violation of assumptions matters.

### Number of Steps *vs.* Time to Adaptation

Under SSWM conditions the time it takes a mutation with selection coefficient *s* to arise and fix in the population is exponentially distributed with mean 1/*N*μ*s*, where μ is the beneficial mutation rate and *N* is the population size. Now if there are a total of *M* beneficial mutations available, the time in generations to fixation of the first beneficial mutation is on average *M* available mutations. All of our theory is based on the asymptotic results formed by taking the limit as *M* goes to infinity. As *M* goes to infinity the time to fixation converges to zero. So a timescale change is required. If we assume that 1 unit of time is equivalent to *N*μ*M* generations, then the mean time for the first beneficial mutation to fix using this timescale will be exponentially distributed with mean *M* goes to infinity, *t* → ∞. This shows that the time limit prediction of the additive model is normal. Applying the analogous central limit theorem result to

## Footnotes

*Communicating editor: M. K. Uyenoyama*

- Received June 28, 2011.
- Accepted October 28, 2011.

- Copyright © 2012 by the Genetics Society of America