The outcomes of evolution are determined by which mutations occur and fix. In rapidly adapting microbial populations, this process is particularly hard to predict because lineages with different beneficial mutations often spread simultaneously and interfere with one another’s fixation. Hence to predict the fate of any individual variant, we must know the rate at which new mutations create competing lineages of higher fitness. Here, we directly measured the effect of this interference on the fates of specific adaptive variants in laboratory Saccharomyces cerevisiae populations and used these measurements to infer the distribution of fitness effects of new beneficial mutations. To do so, we seeded marked lineages with different fitness advantages into replicate populations and tracked their subsequent frequencies for hundreds of generations. Our results illustrate the transition between strongly advantageous lineages that decisively sweep to fixation and more moderately advantageous lineages that are often outcompeted by new mutations arising during the course of the experiment. We developed an approximate likelihood framework to compare our data to simulations and found that the effects of these competing beneficial mutations were best approximated by an exponential distribution, rather than one with a single effect size. We then used this inferred distribution of fitness effects to predict the rate of adaptation in a set of independent control populations. Finally, we discuss how our experimental design can serve as a screen for rare, large-effect beneficial mutations.
- distribution of fitness effects
- clonal interference
- beneficial mutations
- experimental evolution
- Saccharomyces cerevisiae
EVOLUTIONARY adaptation is driven by the accumulation of beneficial mutations. One can ask two basic questions about this process. First, what set of mutations is available to the population? That is, what is the overall mutation rate, U, and the distribution of fitness effects, ρ(s), of new mutations? Second, what is the fate of those mutations that occur? In other words, how does the frequency of each mutation change over time until it eventually fixes or goes extinct?
When beneficial mutations are rare, these two questions are independent. Mutations of a given fitness effect, s, occur at rate Uρ(s). The fate of each mutant is then decided entirely on its own merits: it increases in frequency (or is lost due to random drift) at a rate commensurate with its selective effect. Experiments, however, have shown that even for modestly sized laboratory populations of viruses and microbes, multiple beneficial mutations often spread simultaneously and interfere with one another, an effect known as clonal interference (de Visser et al. 1999; Miralles et al. 1999; Joseph and Hall 2004; Desai et al. 2007; Perfeito et al. 2007; Kao and Sherlock 2008; Lee and Marx 2013; see Sniegowski and Gerrish 2010 for a recent review). This means that the fate of each beneficial mutation depends not only on its own effect, but also on its interactions with the rest of the variation in the population (Lang et al. 2011, 2013). In this regime, the mutation rate and the distribution of fitness effects of beneficial mutations [the DFE, ρ(s)] controls the availability of competing mutations, which then play an important role in determining the fate of each new beneficial mutation (Gerrish and Lenski 1998; Good et al. 2012).
These factors highlight the importance of the DFE as a central parameter in adaptation, determining which new mutations occur and influencing their subsequent fate. Some theoretical work has argued that the DFE will typically be exponential (Gillespie 1983; Orr 2003). However, this is fundamentally an empirical question, and in principle the details of the DFE could be highly system specific. Thus extensive experimental effort has been devoted to measuring the DFE of beneficial mutations in a variety of laboratory populations (Imhof and Schlötterer 2001; Rozen et al. 2002; Sanjuán et al. 2004; Barrett et al. 2006; Kassen and Bataillon 2006; Burch et al. 2007; Perfeito et al. 2007; Rokyta et al. 2008; MacLean and Buckling 2009; Bataillon et al. 2011; McDonald et al. 2011; a separate literature has used population genetic methods to infer the DFE in natural populations, reviewed by Keightley and Eyre-Walker 2010).
Experimental efforts to measure the DFE of beneficial mutations in laboratory populations have largely taken one of two complementary approaches. The first approach is to isolate mutants and directly assay their fitness. The difficulty with this method is that beneficial mutations are rare, so many clones must be screened to isolate comparatively few beneficial mutations (Sanjuán et al. 2004; Kassen and Bataillon 2006). To avoid this difficulty, some studies have imposed a harsh selection and studied the survivors, which by definition must have a beneficial mutation (MacLean and Buckling 2009; McDonald et al. 2011). However, this approach is limited to harsh and typically narrow stresses (e.g., treatment with antibiotic), which may not be representative of adaptation to other conditions.
The second common experimental approach is to track the frequencies of genetic markers over time and use the resulting dynamics to infer the underlying DFE. Such “marker divergence” experiments typically use two or more strains that differ by a single neutral genetic marker, which can be easily tracked through time (e.g., antibiotic resistance or a fluorescent reporter). These strains are mixed, usually in equal proportions, and allowed to evolve in competition. The changes in frequencies of the neutral markers then reflect subsequent beneficial mutations that occur in one or the other genetic background (Novick and Szilard 1950; Atwood et al. 1951; Helling et al. 1981; Paquin and Adams 1983; Adams and Oeller 1986; Imhof and Schlötterer 2001; Barrett et al. 2006; de Visser and Rozen 2006; Hegreness et al. 2006; Perfeito et al. 2007; Kao and Sherlock 2008; Barrick et al. 2010; Lang et al. 2011). Inferring the DFE from such data typically requires estimating the fitness effects of many mutations from the dynamics of relatively few markers, which is naturally quite difficult (Hegreness et al. 2006; Pinkel 2007; Illingworth and Mustonen 2012; Zhang et al. 2012; de Sousa et al. 2013). In principle, this difficulty could be removed by reducing the population size to such a degree that only one or zero beneficial mutations usually arise in each population (Perfeito et al. 2007). However, this requires careful tuning of the population size, to make it small enough to minimize multiple mutations but also large enough to ensure that many replicates acquire a beneficial mutation.
Here, we introduce a twist on the traditional design of marker divergence experiments that produce dynamics more directly revealing of the underlying DFE. Rather than using neutral markers, we tracked the frequencies of marked lineages with a fitness advantage relative to a reference strain. We seeded these marked lineages at low frequency into populations of the reference, so that their subsequent dynamics are reflective of the fates of beneficial mutations with a particular selective advantage. Since the DFE controls the availability of competing mutations and hence the likelihood of clonal interference, we can exploit the observed fates of seeded lineages to infer the DFE. Using lineages with different fitness advantages enabled us to probe different corresponding portions of the DFE. This approach is particularly suited to inferring those aspects of the DFE that are most important in determining the fates of new beneficial “driver” mutations, e.g., the high-fitness tail, which is otherwise hard to measure directly. In the process, we also directly measured how clonal interference alters a key quantity in adaptation: the fixation probability of a beneficial mutation as a function of its fitness effect.
Materials and Methods
All strains used in this study were derived from the base strain DBY15084, a haploid Saccharomyces cerevisiae strain derived from the W303 background with genotype MATa, ade2–1, CAN1, his3–11 leu2–3, 112, trp1–1, URA3, bar1Δ::ADE2, hmlαΔ::LEU2. Each experimental population included a resident and a seeded lineage. The resident lineage was DBY15108, a derivative of DBY15084 in which the fluorescent protein ymCherry was integrated at the URA3 locus (Lang et al. 2011). The seeded lineages were descendants of strain DBY15104 isolated from time points of an earlier long-term evolution experiment (Lang et al. 2011). To allow us to track their frequency using flow cytometry, we amplified a pACT1–ymCitrine pTEF–HISMX6 cassette from plasmid pJHK043 (provided by John H. Koschwanez) and integrated it at the HIS locus using oligos oGW137 (5′-TTGGTGAGCGCTAGGAGTC-3′) and oGW138 (5′-TATGAAATGCTTTTCTTGTTGTTCTTACG-3′) provided by Gregg Wildenberg. From this pool of transformants, we selected strains EFY11–17 based on fitness assays described below.
To obtain seeded lineage strains with a range of fitnesses, we isolated a large number of evolved clones and assayed their fitnesses as described in Lang et al. (2011). Briefly, this protocol is to mix each strain in roughly equal proportion with a reference strain that bears a different fluorescent reporter, propagate these mixed populations for 30 generations, and measure the ratio of the strains at generations 10 and 30 using flow cytometry. Relative fitness was calculated as s = (1/20) × log(final ratio/initial ratio). From among these clones, we chose EFY11–17 to use as seeded lineages and remeasured their fitnesses in 10 replicates. These additional assays showed that strains EFY12–14 and EFY15–16 had indistinguishable fitnesses, and so for the purposes of analysis, strains EFY11, EFY12–14, EFY15–16, and EFY17 were respectively grouped into the fitness classes indicated in Figure 1.
To begin the evolution experiment, we grew an individual resident clone to saturation in 3 ml of standard growth media (YPD supplemented with 100 μg/ml ampicillin and 25 μg/ml tetracyclin). We transferred 128μl of this culture into each well of a 96 well-plate, diluted these cultures 210-fold into 12 96-well plates containing fresh media, allowed these cultures to grow for 10 generations, and froze them at −80° in 15% glycerol. Later, these plates were thawed and propagated for 30 generations (as described below) to reacclimate them to this environment. In parallel, we prepared the seeded clones in the same fashion. We then mixed seeded and resident populations to found a total of 1044 populations in 12 96-well plates (see Supporting Information, Table S1). These populations were propagated at 30° in 128μl YPD per well and diluted every 24 hr by a factor of 210 into new plates containing fresh media. This corresponds to an effective population size Ne ≈ 105 (Wahl et al. 2002; Lang et al. 2011). Each plate contained a set of nine empty wells as cross-contamination controls. All control wells remained sterile throughout the experiment except for two accidents involving plate mixing. This contamination was resolved by restarting from glycerol stocks of an earlier time point. Transfers were carried out using a Biomek FX pipetting robot.
At ∼50-generation intervals, seeded lineage frequencies were measured using flow cytometry. In particular, BD Biosciences Fortessa and LSR-II flow cytometers with high-throughput plate samplers counted ∼100,000 cells per population for the initial time point and ∼30,000 cells per population for time points thereafter. Repeated measurement of populations and blanks indicated that ∼100 cell counts per sample were carried over from previous samples. Therefore the uncertainty in frequency at the first time point was ∼0.1% and ∼0.5% thereafter. These raw data were processed in FlowJo v. 9.2. All processed data are provided in Table S1.
We also assayed the fitness of 16 additional control populations founded with only the resident strain. To do so, these populations were thawed from frozen-archive plates, each was duplicated into four replicates, these were propagated for 30 generations to acclimate them, and then their fitness was assayed as described above.
Note that 386 of the populations were later excluded from analysis, leaving a total of 658 replicate populations, apportioned among the seven seeded lineages and controls as described in Table S1. In 232 of these 386 excluded populations, frequency-dependent selection emerged. We identified these by first investigating 15 populations in which lineages coexisted at constant proportion for hundreds of generations. We found that this coexistence was maintained by frequency-dependent selection exclusively in populations having a characteristic pellet morphology, so we excluded from analysis all populations that also had this morphology. In the other 154 cases, the initial frequency of the seeded lineage was so low that it could not be precisely determined or extinction due to drift was common. To exclude these without biasing the statistics of trajectories, we chose a cutoff for the initial frequency of each seeded strain such that in all replicates in which the initial frequency was above the cutoff, the seeded lineage rose to at least 5%. All replicates below the cutoff were excluded.
Tracking the fates of seeded lineages
Any beneficial mutation creates a new lineage that is more fit than the genetic background in which it arose. To systematically study the fates of such lineages, we prepared a set of fluorescently labeled haploid budding yeast strains (the seeded lineages) with measured fitness advantages, s0, of ∼3, 4, 5, and 7% relative to a closely related but separately labeled reference strain. We founded 658 replicate populations of the reference (the resident), and introduced one of the seeded lineages at low frequency into each replicate population. We propagated these populations asexually in batch culture for hundreds of generations at an effective population size of Ne ≈ 105, measuring the frequency of the seeded lineage in each population approximately every 50 generations (see Materials and Methods). This allowed us to track the fate of the seeded lineages over time, as illustrated in Figure 1.
Each seeded lineage was introduced at an initial frequency f0 large enough that genetic drift is expected to be weak relative to natural selection (i.e., ). In the absence of additional mutations, this implies that the frequency f(t) of each seeded lineage should increase deterministically according to the logistic equation, . This expectation is indicated by the dashed curves in Figure 1. As is apparent from the figure, most seeded lineages initially conformed to this expectation (the exceptions are lineages whose initial frequencies were only severalfold greater than , which is low enough that genetic drift could partially reduce their initial rate of increase). Subsequently, many lineages diverged into a variety of qualitatively distinct fates. Since both genetic drift and measurement errors are expected to be small relative to this divergence (see Materials and Methods), the variation in the fates of seeded lineages indicates that their relative fitnesses were modified by new beneficial mutations arising during the experiment.
Fates of seeded lineages reflect supply of competing beneficial mutations
The trajectory of each seeded lineage provides information about the beneficial mutations that did (or did not) arise within the competing resident population. Consider, for example, the case in which a seeded lineage of fitness s0 peaks and then declines in frequency. This reflects a clonal interference event, where one or more new beneficial mutations in the resident population create a competing lineage with fitness >s0 (see Figure 2). By considering the range of outcomes in replicate populations, we can estimate the probability of these events (Figure 3). A higher probability of clonal interference implies a larger supply of beneficial mutations that can generate successful competing lineages.
Comparing the fates of seeded lineages of different fitnesses provides additional insight into the mutations responsible for clonal interference. For example, the seeded lineage with fitness advantage s0 = 7% always swept to fixation without any detectable deviation from the expectation in the absence of interference. In contrast, the lineage with s0 = 5% swept in 84% of replicates. Together, these two results suggest that clonal interference in the s0 = 5% case was primarily due to beneficial mutations in the resident that created competing lineages with fitness advantages between 5 and 7%. Extending this logic, comparing the fates of seeded lineages with s0 = 5, 4, and 3% provides information about the probabilities that beneficial mutations create competing lineages of fitness between 4 and 5% and between 3 and 4%.
While this intuition is straightforward, quantitative inference of the DFE requires us to connect the rates of individual mutations with the fitnesses of competing lineages. This is complicated because competing lineages may often contain multiple beneficial “driver” mutations. In addition, beneficial mutations may also arise in seeded lineages, despite their initially much smaller population sizes. To fully account for these effects, we now introduce a computational method for inferring the DFE.
DFE inferred from seeded lineage dynamics
We implemented an approximate likelihood method which uses information from the shapes of the trajectories of seeded lineages to infer the DFE of beneficial mutations. Any particular trajectory carries only information about the beneficial mutations that rose to significant frequency in that population (i.e., the “contending” mutations; Rozen et al. 2002), but by modeling the trajectories of many populations together, we can learn about the overall distribution of possible beneficial mutations for the strains in our experiment. To make this inference tractable, we limited ourselves to single-parameter DFE shapes characterized by an average fitness effect and beneficial mutation rate Ub. For concreteness, we considered three canonical distributions commonly used in the literature: an exponential DFE, , a uniform DFE , and a δ-function DFE, where all beneficial mutations have the same fitness effect, . We explain the significance of these choices in the Discussion.
To compute the likelihood of particular DFE parameters, we ran forward-time simulations of the experiment and estimated the likelihood as the fraction of replicate simulations that matched the data (see Appendix). In principle, we could use the complete trajectory of each seeded lineage for this comparison, identifying a match between simulations and data whenever the two were identical. However, in practice this was not computationally tractable. Instead, we focused on two features of the dynamics: the first peak frequency, fpeak, of each seeded lineage (binned into quartiles, including fixed lineages) and the rate at which the seeded lineage declined in frequency following this peak, sdown (binned into 2% intervals). These are illustrated in Figure 2. We chose to focus on these two quantities because we expect them to be particularly sensitive to the DFE: fpeak indicates how quickly a competing lineage arose in the resident population, while sdown measures how much the relative fitness of the resident population increased in this time. In addition, this focus on early time dynamics ensures that most relevant mutations occur in the resident (due to its initially much larger population size), minimizing the effects of potential differences in the DFEs of the seeded genotypes.
For the three considered DFE shapes, we identified the most-likely parameters Ub and by scanning a grid of candidate values. These parameters are shown in Figure 4, along with confidence bounds estimated by bootstrapping (see Appendix). For each of these most-likely parameters, we show simulations of the s0 = 3% seeded lineage trajectories in Figure 5 and for the s0 = 4, 5, and 7% lineages in Figure S1, Figure S2, and Figure S3. Using a likelihood ratio test, we found that the exponential DFE provided a significantly better fit to the data than either the δ-function (P < 10−4) or uniform distribution (P < 10−4) and that the uniform provided a better fit than the δ (P < 10−4).
Since the seeded lineage with s0 = 7.3% always swept to fixation, indicating that larger-effect mutations must be rare, we checked whether truncating the high fitness end of the exponential DFE would improve its fit to the data. To do so, we considered an exponential DFE truncated at 7.3% and performed the same inference and statistical tests as above. We found that this truncated exponential provided a better fit to the data, but not significantly so (P > 0.08, likelihood-ratio test). We also checked whether truncating the low-fitness end of the exponential would affect its fit to the data. We varied this truncation and found that, for the inferred exponential DFE parameters, discounting mutations with fitness effects <2.1% improved these parameters’ fit to the data, but only marginally so. This indicates that the seeded lineages were not strongly affected by mutations with fitness effects <∼2%.
Measurements of adaptation rate corroborate DFE inference
In addition to determining the dynamics of seeded lineages, the DFE determines the rate of adaptation. Thus to test our inferences, we measured the changes in fitness over time of 16 control populations that consisted of the resident strain alone. We compared the average fitness of the control populations with the predictions of the most-likely exponential, uniform, and δ-function DFEs. As seen in Figure 6, the inferred exponential is fairly accurate in predicting these data, whereas the uniform and δ-function are less so.
Throughout our analysis, we have implicitly assumed that the DFE remained the same across all genotypes in the experiment, which implies that the fitnesses of populations should increase linearly on average after some initial transient. In contrast, the rate of adaptation slowed after generation 380 (P < 3 × 10−3; see Appendix), which is reminiscent of declines in adaptation rate commonly observed in other evolution experiments (Elena and Lenski 2003). Fortunately, we based our DFE inference on the early features of seeded lineage dynamics, most of which transpired prior to this time. Thus the change in adaptation rate is not inconsistent with our method.
Interest in the DFE stems from a desire to know what beneficial mutations are available and which of these drive adaptation. In asexual populations, the DFE also determines the distribution of competing mutations and the frequency of clonal interference. Here, we have described a simple experiment that exploits this connection to infer the DFE in experimental populations of S. cerevisiae. By introducing lineages with different fitnesses and tracking their subsequent dynamics, we inferred the DFE from the statistics of observed interference events. In the process, we directly observed how initial fitness advantages and clonal interference jointly influence the fixation or loss of adaptive lineages.
Previous experimental work has analyzed several other cases where an introduced lineage is outcompeted by a less-fit resident population (Waite and Shou 2012; Gifford and MacLean 2013). Unlike our experiment, these earlier studies focus on the fates of a few key mutations (e.g., antibiotic resistance or microbial “cheaters”) without attempting to infer the underlying DFE. Nevertheless, our results complement this earlier work by showing the transition between fitness effects that are susceptible to clonal interference and those that decisively sweep to fixation, which has previously been studied theoretically (Neher and Shraiman 2011; Schiffels et al. 2011; Good et al. 2012). In our system, this transition occurs when the fitness of the seeded lineage is ∼5%, which represents a critical effect size required for a mutation to drive adaptation. Of course, in natural populations some adaptive variants may arise in populations with substantial standing fitness variation, rather than the homogeneous resident populations employed here. In this case, the transition between mutations that sweep and those that experience interference is determined both by the DFE and by the distribution of fitnesses in the resident population. Further work is needed to address this situation.
Our computational inference method allowed us to distinguish between three representative DFE shapes: exponential, uniform, and δ-function (in which all mutations have the same effect). These represent idealized approximations to the actual DFE, and it is likely that a larger number of replicates or more sophisticated computational techniques could produce other DFE shapes with a significantly better fit. Yet one cannot continue this process indefinitely without reaching a point where further determination of the fine-scale DFE becomes irrelevant for any particular application. In the end, certain features of the DFE matter for predicting certain aspects of the evolutionary process, and the required level of resolution is ultimately determined by the aspect of adaptation one wishes to study. This experiment, which focuses on the fates of advantageous mutants, provides a concrete illustration of this principle. Previous work has suggested that the dynamics of adaptation can be summarized by a single characteristic fitness effect, with a magnitude that depends on the actual DFE and the level of clonal interference within the population (Hegreness et al. 2006; Desai and Fisher 2007; Good et al. 2012). By rejecting the δ-function and uniform DFEs in favor of the exponential, we have shown that this assumption breaks down when one considers more detailed features of the lineage trajectories.
Given these caveats, the DFE that we inferred is worth pondering. We estimated an exponential distribution with mean and total beneficial mutation rate Ub = 1.0 × 10−4. Our modeling indicated that of these mutations, only those with effects >2% affected the fates of seeded lineages and that these mutations are predicted to arise at a rate of order 10−5 per individual per generation. If one assumes a per-genome point mutation rate of roughly 4 × 10−3 (Lynch et al. 2008), this would imply that of order 1 in 1000 mutations confer a fitness advantage of two percent or more. This is consistent with past work in a related system (Desai et al. 2007) and is also similar to DFEs reported for bacteria adapting to rich laboratory media (Kassen and Bataillon 2006; Perfeito et al. 2007; Wiser et al. 2013). In such permissive environments, other studies in yeast that have identified specific adaptive mutations report a mix of loss-of-function vs. other kinds of beneficial mutations (Jansen et al. 2005; Kao and Sherlock 2008; Wenger et al. 2011; Kvitek and Sherlock 2013; Lang et al. 2013). If a large fraction of beneficial mutations in our system are loss of function, and if ∼10% of spontaneous mutations in a gene cause loss of function (Lang and Murray 2008), our results would suggest that ∼1 in 100 genes are beneficial to disrupt. This is at least qualitatively consistent with direct measurements using the yeast deletion collection (Sliwa and Korona 2005; Bell 2010). Together, these results illustrate how inferences from lineage dynamics can combine with other lines of evidence to help build a more complete picture of adaptation.
Finally, we note that our experimental design has a potential practical application as a screen for beneficial mutations. Whenever a seeded lineage with fitness advantage s0 experiences clonal interference, the resident must contain a mutant lineage at appreciable frequency with fitness greater than s0. Thus, by picking clones from the resident immediately after a clonal interference event, we should in principle be able to isolate rare large-effect beneficial mutations. This is similar in spirit to earlier studies that used the dynamics of neutral markers to screen for adaptive clones (e.g., Rozen et al. 2002). However, because our seeded lineages are more fit than the resident, we can screen for beneficial mutations with particularly large effects. Further, since the resident must quickly generate a competing lineage, our approach is more likely to find clones with fewer mutations of larger effect rather than many of smaller effect, as well as limit the number of nonbeneficial hitchhiking mutations. To illustrate this idea, we simulated seeded lineage trajectories and then simulated picking a clone from the resident population after observed clonal interference events. In Figure 7, we show the average fitness of each of these simulated clones and of the largest-effect mutation in each clone. As is apparent from the figure, it should be feasible to use this approach with a seeded lineage of the appropriate fitness to isolate large-effect beneficial mutations with specific fitness effects.
We thank Sergey Kryazhimskiy, Christopher S. Wylie, Andrew Murray, and Katya Kosheleva for useful discussions and comments on the manuscript; Melanie Muller, Gabriel Perron, John Koschwanez, and Gregg Wildenberg for help with strain construction; and Patricia Rogers for generous technical support of flow-cytometry. Simulations in this article were performed on the Odyssey cluster of the Research Computing Group at Harvard University. This work was supported by training grant GM831324 from the National Institutes of Health (NIH) and grant 1219334 from the NSF Physics of Living Systems graduate student network (E.M.F.), a National Science Foundation Graduate Research Fellowship (B.H.G.), and the James S. McDonnell Foundation, the Alfred P. Sloan Foundation, the Harvard Milton Fund, grant PHY 1313638 from the National Science Foundation, and grant GM104239 from the NIH (M.M.D.).
Communicating editor: J. J. Bull
- Received December 5, 2013.
- Accepted January 24, 2014.
- Copyright © 2014 by the Genetics Society of America