Evolution by Small Steps and Rugged Landscapes in the RNA Virus ϕ6
Christina L. Burch, Lin Chao


Fisher’s geometric model of adaptive evolution argues that adaptive evolution should generally result from the substitution of many mutations of small effect because advantageous mutations of small effect should be more common than those of large effect. However, evidence for both evolution by small steps and for Fisher’s model has been mixed. Here we report supporting results from a new experimental test of the model. We subjected the bacteriophage ϕ6 to intensified genetic drift in small populations and caused viral fitness to decline through the accumulation of a deleterious mutation. We then propagated the mutated virus at a range of larger population sizes and allowed fitness to recover by natural selection. Although fitness declined in one large step, it was usually recovered in smaller steps. More importantly, step size during recovery was smaller with decreasing size of the recovery population. These results confirm Fisher’s main prediction that advantageous mutations of small effect should be more common. We also show that the advantageous mutations of small effect are compensatory mutations whose advantage is conditional (epistatic) on the presence of the deleterious mutation, in which case the adaptive landscape of ϕ6 is likely to be very rugged.

ADAPTIVE evolution by the substitution of many mutations of small effect is key to both the modern synthesis and some of the current views of evolutionary biology because it provides an explanation for a variety of phenomena, such as the continuity of species variation, polygenic inheritance, the gradualism of phenotypic evolution, and rate variation of molecular clocks (Charlesworthet al. 1982; Lande 1983; Orr and Coyne 1992; Chaoet al. 1997). The theoretical explanation for why adaptive evolution should be by small steps is attributed to Fisher (1930), who argued this viewpoint using a geometric model of evolution in multidimensional phenotypic space (see Figure 1). In Fisher’s model, a population is assumed to begin a distance from an adaptive optimum (or peak). As natural selection moves the population toward the peak, Fisher theorized that the approach should be by small steps because advantageous mutations of small effect should be more abundant than those of large effect. The bias against large advantageous mutations results because large mutations (in general) are more likely to have deleterious pleiotropic (side) effects, and the bias increases with the number of phenotypic dimensions, which presumably is correlated with the complexity of the genome.

However, the validity of both evolution by small steps and Fisher’s explanation has been challenged recently (Orr and Coyne 1992). Whereas some adaptations clearly result from many mutations of small effect, others are caused by a few mutations of major effect. Also, because the universality of pleiotropy is not well established, it is not known whether the bias against large mutations is strong enough to produce adaptation by small steps. Here we present the results of a study using the RNA virus ϕ6 to assess the validity of Fisher’s model, and we report strong support for the model. Our results were surprising because it was initially not certain whether an organism with a minimal genome would be sufficiently complex to generate the effects predicted by Fisher.


Testing experimentally whether and why adaptive evolution proceeds by small steps is difficult because it requires a system that can be manipulated and monitored over evolutionary time scales. We have designed a test by taking advantage of the high genomic mutation rate and short generation time of RNA viruses. We tested whether viral evolution involves small steps, but we also devised a test that uses the prediction that advantageous mutations of small effect are more abundant than those of large effect (see above). The latter test considers the fact that although advantageous mutations of smaller effect may be more common, they should have a smaller selective advantage and, hence, a smaller probability of fixation (Crow and Kimura 1970). As a result, in a population that is sufficiently large to harbor advantageous mutations of both large and small effect, evolution should proceed by large steps because of the higher probability of fixation of large mutations. In a small population, however, small advantageous mutations are the only ones sufficiently common to appear. Once they appear, they are swept to fixation by selection despite their lower probability of fixation, and evolution is by small steps. Thus, when corrected for the probability of fixation, Fisher’s model makes the second prediction that step size during adaptive evolution should be smaller with decreasing size of the population under selection.

Figure 1.

—Fisher’s model in two dimensions (modified from Kimura 1983). Two continuous phenotypes X and Y have a fitness optimum at point O. Fitness drops as X and Y move away from the optimum, and the circle through D and centered around O depicts the phenotypes that have the same low fitness. If a population is at D, mutations of small and large effect arising in the population are represented, respectively, by the small and large circles centered around D. The dashed portions of the circles correspond to advantageous mutations that bring the population from D to points closer to O. Although the advantageous proportion is ∼50% of the small circle, it is much less than 50% of the large circle, and the bias against large mutations becomes stronger as more dimensions are added.

We tested both predictions by building on two known results for the RNA bacteriophage ϕ6. First, if ϕ6 is subjected to successive population bottlenecks of one phage, the resulting genetic drift is sufficiently strong to cause the fixation (increase to a frequency of 100%) of deleterious mutations and a decrease in the mean fitness of the population (Chao 1990). Second, if the population that had accrued the fitness decrease is then propagated through a succession of larger bottlenecks, fitness increases because natural selection is now sufficiently strong to override genetic drift (Chao et al. 1992, 1997). We combined these results by determining whether fitness during the decline and recovery phases in a single population lineage changed by a different number of steps. We also varied the size of the bottleneck during recovery to determine the effect of population size on step size.


Stocks and culture conditions: The RNA bacteriophage ϕ6 used in this study is a laboratory clone descended from the original isolate of Vidaver et al. (1973). Pseudomonas syringae sv. phaseolicola (ATCC #21781), the standard host of ϕ6, was obtained from the American Type Culture Collection (Rockville, MD), and P. pseudocaligenes ERA, the alternate host, was obtained from L. Mindich. Details of handling, culture (in LC medium), and storage of phage and bacteria are in Chao et al. (1997).

Figure 2.

—Trajectory of fitness decline of ϕ6 population successively propagated at a bottleneck of one phage. Each point graphed is the mean ± SEM (Sokal and Rohlf 1994; log10-transformed; n = 3) for samples taken after each bottleneck. Solid line shows single step identified by fit of data to a model of increasing step number.

Fitness decline by genetic drift: Details for protocol are described (Chao 1990; Chaoet al. 1997). Briefly, a ϕ6 clone was transferred through a succession of bottlenecks of one phage to reduce population size and induce genetic drift. The phage were plated on a lawn of P. phaseolicola and incubated to allow the phage to reproduce and form plaques on the lawn. A plaque was then randomly chosen from the lawn and used to seed a new lawn to generate more plaques. A random plaque was then chosen from the new lawn, and the process was repeated for as long as desired. A succession of bottlenecks is achieved because each plaque results from a single phage. Because the phage expand to ∼8 × 109 phage in five generations within a plaque, there is opportunity for selection to operate within a population propagated by such one-plaque transfers. However, the intensity of selection is not sufficient to overcome the intense genetic drift generated by the bottlenecks because mean fitness decreases by the accumulation of deleterious mutations during one-plaque transfers (Chao 1990). After the phage were removed from the chosen plaques to seed the new lawn, the remaining phage were frozen and stored for the later fitness assays.

Fitness recovery by population expansion: A ϕ6 clone that had acquired low fitness by being subjected to eight bottlenecks of one phage (see above) was allowed to recover fitness by population expansion, which intensifies selection and reduces the effect of genetic drift. Expansion was achieved by the same protocol used in generating fitness loss by genetic drift, except that the bottleneck size was enlarged by using a larger number of random plaques to seed the new lawn. The seeding was achieved by plating from a lysate, which was produced by harvesting the desired number of plaques from the old lawn and suspending them in LC medium (Chaoet al. 1997). After plating, all the pooled lysates were frozen and stored for the fitness assays. The effects of seven bottleneck sizes (10, 33, 100, 333, 1000, 2500, and 10,000) were examined. For each of these larger bottleneck sizes, a population was propagated at that bottleneck size for a succession of 20 bottlenecks or until fitness was recovered to approximately the original level. Because reproduction within a single plaque consists of about five generations (Chaoet al. 1997), 20 bottlenecks correspond to ∼100 generations.

Fitness assay: Protocol is detailed in Chao (1990) and Chao et al. (1997). Briefly, a test ϕ6 and a genetically marked reference ϕ6 were mixed at a 3:1 ratio and plated on a P. phaseolicola lawn. After a 24-hr incubation, the resulting plaques were harvested and plated to determine the ratio of the two phage after reproduction in the lawn. The ratio of phage (test to reference) was monitored by marking the reference phage with a spontaneous host range mutation that allows growth on the alternate host P. pseudocaligenes (Mindichet al. 1976). Because of the extended host range, the reference ϕ6 makes clear plaques on mixed lawns of P. phaseolicola and P. pseudocaligenes (200:1 ratio), whereas the unmarked phage makes turbid plaques. The number of plaques was kept at a density of 400 per plate to minimize plaque overlap.

Fitness was measured as W = R1/R0, where R1 and R0 are, respectively, the ratio (test to reference) before and after reproduction on the lawn. This fitness assay effectively measures the realized growth rate of the test phage relative to the reference phage, and a value of W = 1 indicates equal fitness. The host range marker introduces a 5.5% fitness reduction (Chao 1990), but that cost does not affect the interpretation of the present results because all the analyses were made on a relative scale. We used an initial ratio of 3:1 (test to reference) instead of the 1:1 ratio used by Chao (1990) to obtain a better estimate of lower fitness values. Many of the lower fitness phage derived in this study reproduced too slowly to be observed after growth at a 1:1 ratio. We compared whenever possible the effect of using either an initial ratio of 3:1 or 1:1 and found no evidence of a frequency dependence (data not presented). All fitness values presented were replicated three times and not adjusted for the cost of the marker on the reference phage.

Determination of step size and number: The trajectory of fitness changes over time during the decline and the recovery phases was determined by retrieving a single ϕ6 test clone (rather than a population sample) from the frozen samples and then determining the fitness of the test clone relative to a reference ϕ6. For the decline phase, it makes no difference whether a test clone or a population sample is used because the bottleneck of one phage during the decline essentially makes the population a clone. For the recovery phase, however, a single clone was used to avoid misidentifying steps resulting from a transient polymorphism. If a recovery were caused by a single mutation of large effect, a sequence of population samples could show mean fitness increasing in discrete steps as the mutation is swept to fixation and give the false appearance of multiple steps. The fitness of all test clones was measured in triplicate.

Step size and number were estimated by using the methods of Elena et al. (1996) to fit the fitness trajectories to a model of increasing step number. The model initially assumes zero steps and examines the fit of adding a step. If the addition leads to a significant (P < 0.05) reduction in the residual sums of squares, one step is identified. Significance is assessed by a partial F-test (Kleinbaum and Kupper 1978) that compares the variance explained by the new step to the error variance. Steps are sequentially added until a new step is not significant (P > 0.05). Because we measured the fitness of all the test clones in triplicate, our analysis consisted of a nested design using only the mean of the three replicates to determine step size and number.

Because fitness assays were performed on a clonal isolate from each of the frozen samples from the recovery populations, ancestral genotypes were observed at several time points in some populations. To prevent ancestral genotypes from inflating the unexplained error (and increasing the likelihood of rejecting a true step), they were not included in the final estimate of step number. A clone was excluded if its fitness was significantly less (P < 0.05 by a t-test, Sokal and Rohlf 1994) than the step to which it was assigned.


Fitness decline: By propagating a ϕ6 clone through a succession of bottlenecks of one phage, we induced intensified genetic drift and caused the lineage to experience a decline in fitness (Figure 2). A fit of the fitness trajectory to a model of increasing step number confirmed the presence of a one-step drop after 25 generations. Whereas the addition of the first step to the model was significant (P ≤ 0.0001), the addition of the second step was not (P = 0.267). It is very likely that this single step was caused by one deleterious mutation because the decline population was sampled after every bottleneck, and multiple mutations could be responsible for the decline only if they occurred in between two bottlenecks (a window of about 5 generations). Given that deleterious mutations do not always fix in replicate populations after 40 bottlenecks of one phage (Chao 1990), the probability of fixing a single deleterious mutation (per bottleneck) must be small, in which case the probability of fixing multiple mutations is even smaller.

Fitness recovery: To examine the pattern of adaptive evolution after the acquisition of a deleterious mutation, seven recovery populations were established from phage isolated from the decline population after 40 generations (Figure 2), and they were propagated with larger bottleneck sizes of 10, 33, 100, 333, 1000, 2500, and 10,000 phage, respectively. The recovery populations were maintained for either 100 generations or until fitness recovered to approximately the same level as before the decline (a log10 value of about zero in Figure 2). The fitness of a single ϕ6 clone (see materials and methods) was then measured after every bottleneck to construct the fitness trajectory of the populations over the time course of the recovery.

A fit of the fitness trajectories demonstrated clearly the presence of multiple steps in several recovery populations (Figure 3). Fitting the trajectories to a model of increasing step number yielded estimates of the minimal number of steps required for complete recovery (Table 1). More than one step was estimated for all recovery populations with bottleneck sizes of <1000 phage, and a minimum of four steps was estimated for a bottleneck size of 333. Minimal estimates were used because many recoveries were incomplete.

Step and population size: We performed a linear regression to test the prediction that step size is affected by population size, testing for a positive relationship between the size of the first step and the size of the bottleneck during fitness recovery. The size of the first step was estimated from the data presented in Figure 3 and one additional recovery population with a bottleneck size of 100 (not presented). Only the first step was used to ensure a common starting point or baseline for the comparison. A least-squares fit yielded a significant positive linear regression of the size of the first step on bottleneck size (Figure 4).

Figure 3.

—Trajectories of fitness recovery of ϕ6 populations successively propagated at varying bottleneck sizes greater than one phage. Each point graphed is the mean ± SEM (log10-transformed; n = 3) for a clone taken from the populations after each bottleneck. Open circles represent ancestral genotypes that were not included in the final analysis (see materials and methods). The solid line shows steps identified by fit of data to a model of increasing step number.

Beneficial vs. compensatory mutations: The present experimental design offers a unique and simple design for partitioning the effects of beneficial and compensatory mutations. The fitness gain experienced by the viruses during the recovery phase was caused by mutations, but a distinction can be made between beneficial and compensatory mutations by following their definitions (Wagner and Gabriel 1990), respectively, as either unconditionally advantageous or conditionally advantageous (epistatic) on the deleterious mutation. It follows then that the availability of beneficial mutations depends on how close or distant the original ϕ6 clone used to start the decline population (Figure 2) was from an adaptive peak. For instance, if the original ϕ6 were on a peak, beneficial mutations would be, by definition, unavailable to the phage, and the entire recovery would have occurred by compensatory mutations. On the other hand, if the distance between the original ϕ6 and the peak were equal to the fitness gain during the recovery phase, the recovery could have resulted from either beneficial or compensatory mutations, or both. Thus, the distance to the peak offers a maximum estimate of fitness gain during recovery that can be attributed to beneficial mutations. The remainder is then a minimum estimate of gain caused by compensatory mutations.

To determine the approximate distance of the original ϕ6 clone to a peak, we subjected this phage to the same selective conditions (see materials and methods) used for the recovery. Bottleneck sizes of 10, 33, 100, 333, and 1000 were used, and the phage were propagated for the same number of generations of selection as the recovery populations at each of the respective bottleneck sizes (Figure 3). The difference between the fitness of the phage after selection and before selection was used as a measure of the distance to a peak. The general outcome was that the original ϕ6 was unable to evolve any significant fitness gain when subjected to the same selective conditions as an equivalent recovery population (Table 2). There was a significant effect at a bottleneck size of 1000, but the gain amounted to only 0.134/1.027 = 13% of the recovery. Thus, the original ϕ6 was relatively close to an adaptive peak (or plateau), and the maximum contribution of beneficial mutations to the recovery was small. As a result, adaptive evolution during the recovery must have been largely fueled by compensatory mutations, and we estimate a minimum contribution on the order of 87-100%.

View this table:

Step number and fitness gain during recovery at different bottleneck sizes


We take our results with the bacteriophage ϕ6 to provide strong support for Fisher’s geometric model and evolution by small steps. Our first result is that when a population with a deleterious mutation is allowed to recover fitness by natural selection, the recovery is often by mutations of effects smaller than the magnitude of the initial deleterious mutation (Figure 3). Our use of a deleterious mutation to displace a population from an adaptive peak differs from Fisher’s (1930) original presentation of his model because Fisher considered a population that began a distance from an adaptive peak, perhaps because of novel selective pressures. We considered testing Fisher’s model by challenging the virus with a new environment, but we settled on the present experimental design because the deleterious mutation offers an important control. Previous studies with Escherichia coli (Lenskiet al. 1991; Lenski and Travisano 1994; Elenaet al. 1996) were the first to demonstrate experimentally that adaptive evolution can occur by multiple steps, but those results are less conclusive because the bacteria were challenged with a new environment. For instance, if the distance to the new adaptive peak is very large and the largest possible adaptive mutations are smaller than that distance, adaptation would proceed by multiple steps, even if Fisher was incorrect; i.e., larger adaptive mutations were more abundant than, or as common as, smaller adaptive mutations. By displacing the population with a deleterious mutation, we know in our study that the distance to the peak is within the range of possible mutations. Thus, our demonstration of multiple steps provides evidence, not only that adaptation proceeds by multiple mutations, but that among the set of all possible mutations, adaptation proceeds by the subset of mutations with small effects.

Figure 4.

—Effect of bottleneck size on size of first step during recovery. The size of the first step was estimated as log10(W1) - log10(W0), where W1 is the mean fitness of all fitness measures after the first step, but before the second step, and W0 is the mean of all fitness measures before the first step. Values of W1 and W0 were estimated from the data presented in Figure 3 and one additional recovery population with a bottleneck size of 100 (not presented). A least-squares fit yielded a significant linear regression (solid line, F = 7.53, d.f. = 6, P = 0.0168, one-tailed test; Sokal and Rohlf 1994). A one-tailed test was justified because we had predicted the direction of the association (see experimental design).

View this table:

Fitness gain by the original ϕ6

However, our strongest result in support of Fisher’s model comes from the positive regression of step size onto bottleneck size (Figure 4). This result confirms qualitatively the model’s major prediction, which is that advantageous mutations of small effect should be more abundant than those of large effect (see experimental design). Such a positive relationship between step size and population size is additionally important because it argues that previous reports of adaptive evolution by both large and small mutations (Orr and Coyne 1992) are not at odds with Fisher’s model. These results may have been mixed because the sizes of the evolving populations were not taken into consideration, and it may well be that population size could explain the varied outcome. Roush and McKenzie (1987) used similar reasoning in suggesting that population size could be the explanation for why insecticide resistance is generally caused by just one or two loci in natural populations, whereas selection for resistance in (smaller) laboratory populations more commonly produces polygenic resistance.

Our finding of support for Fisher was not entirely expected. A concern had been raised as to whether pleiotropy was sufficiently strong in any organism for Fisher’s explanation to apply (Orr and Coyne 1992) and whether an RNA virus would have a sufficiently complex genetic system to generate the required pleiotropy was uncertain. In fact, because we first examined recovery in the larger populations sizes, in which large steps were observed (Figures 3 and 4), we were initially led to believe that evolution by small steps did not apply to RNA viruses. However, our combined results show clearly that small mutations are possible, and that a major prediction of Fisher’s hypothesis, that advantageous mutations of small effect are more common than advantageous mutations of large effect, is supported. This suggests that pleiotropy may indeed be universal (Wright 1968; Gimelfarb 1996), and if ϕ6, an RNA virus with a genome of slightly more than 104 nucleotides and encoding only 13 proteins (Gottliebet al. 1988), is sufficiently complex, organisms with larger genomes and more elaborate developmental systems and phenotypes also must surely be sufficiently complex.

Fisher and Wright: Although our initial motivation was to test predictions stemming from Fisher’s geometric model of adaptive evolution, we were aware that our experimental design actually offered a unique opportunity to ascertain whether the adaptive changes in ϕ6 were the result of either beneficial or compensatory mutations (see results). Following the definitions of beneficial and compensatory mutations (Wagner and Gabriel 1990), we estimated the maximal and minimal contributions of beneficial and compensatory mutations to the recovery phase of our viral populations, and we found an overwhelming effect by compensatory mutations.

These compensatory mutations were conditional (or epistatic) on the deleterious mutation, and the requirement for the deleterious mutation was demonstrated by the inability of the original ϕ6 (without the deleterious mutation) to evolve anywhere near the amount of fitness gained during the recovery (Table 2). It follows that the recovered viruses must have evolved across a fitness valley. The acquisition of the deleterious mutation by genetic drift represents the descent into the valley, and the recovery represents the rise to new fitness highpoints and possibly new adaptive peaks. It is important to note that such peaks differ from the peak considered by Fisher’s model (Figure 1), which assumes an adaptive landscape in phenotypic space (Whitlocket al. 1995). Whereas a peak shift in phenotypic space requires a change in both morphology and ecological niche, a shift in genotypic space does not. Unfortunately, it is not possible to determine whether we have witnessed a peak shift in the present study because we do not know if the peak occupied by the original ϕ6 is connected by a ridge to the highpoints on the other side of the valley. We can only say with certainty that the phage crossed a valley, and based on this, the genotypic landscape of ϕ6 is likely to be very rugged (Kauffman 1993).

Our demonstration of a fitness valley relied on estimating the minimal contribution by compensatory mutations to evolution during the recovery. A possible problem with the method is that the recovery could have resulted from a (back) mutation that restores the sequence of the original ϕ6. In that case, fitness would have been recovered, not by crossing the valley, but by returning to the same side. However, a back mutation would have caused a recovery in a single step, and among the bottleneck sizes used in our present analysis of beneficial and compensatory mutations (Table 2), a single step was observed only at a bottleneck size of 1000 (Table 1). Thus, the multiple and smaller steps observed for all the other bottleneck sizes (10, 33, 100, and 333) are very likely to be compensatory mutations.

The observed high frequency of compensatory mutations in ϕ6 is consistent with the ease with which other studies in experimental evolution have been able to generate epistasis with DNA bacteriophages, bacterial plasmids, and E. coli (Malmberg 1977; Bouma and Lenski 1988; Lenski 1988; Lenskiet al. 1994; Schraget al. 1997). These combined results reinforce the notion that epistasis is very common in adaptive evolution and that the fitness landscape in genotypic space is very rugged. Because Fisher was never a strong believer in the importance of epistasis, it is ironic that we should demonstrate epistasis in a test of his model. The importance of epistasis in evolution is actually more consistent with the views of Wright, who used epistasis and a rugged genotypic landscape to develop his shifting balance theory (Wright 1931, 1932). Our results demonstrate that although Fisher and Wright may have had disparate views (Provine 1986, pp. 241, 274; Wade 1992; Fensteret al. 1997), Fisher’s geometric model and Wright’s rugged landscape are quite compatible. To get a rugged landscape, it is only necessary that Fisher’s model be correct and that a population can be on an adaptive peak. Once the latter two conditions are satisfied, each time that a population acquires a deleterious mutation by genetic drift, it is likely that a fitness valley will be crossed because the recovery should be by multiple compensatory mutations. A second irony is that our support for Fisher’s model also points to an inconsistency in his views. Fisher may have thought of natural populations as exceedingly large (Provine 1986, pp. 237 and 255; Fensteret al. 1997), but our finding of a positive relationship between step and population size (Figure 4) shows that his model may be most relevant in smaller populations.

We hope that our results stimulate more research interest in Fisher’s geometric model and the generality of genome level pleiotropy and epistasis. In particular, we hope that new studies may examine whether variation in population size may explain the previously reported mixed results of adaptive evolution by mutations of both large and small effects.


We thank Kathy Hanley, Paul Turner, Stacey Lance and Joanne Smale for discussions; Robert Billerbeck, Melissa Parker, and Jessica Madert for laboratory assistance; L. Mindich for advice, phage, and bacteria; Russel Lande, as well as Richard Lenski and members of his lab, for comments on an early version of this manuscript. This work was supported in part by a Howard Hughes Medical Institute Graduate Fellowship (C.B.), National Science Foundation Dissertation Improvement Grant DEB-9801469 (C.B.), and funds from the Office of Graduate Studies and Research at the University of Maryland (L.C.).


  • Communicating editor: A. G. Clark

  • Received May 22, 1998.
  • Accepted November 19, 1998.


View Abstract