Abstract
We know little about the distribution of fitness effects among new beneficial mutations, a problem that partly reflects the rarity of these changes. Surprisingly, though, population genetic theory allows us to predict what this distribution should look like under fairly general assumptions. Using extreme value theory, I derive this distribution and show that it has two unexpected properties. First, the distribution of beneficial fitness effects at a gene is exponential. Second, the distribution of beneficial effects at a gene has the same mean regardless of the fitness of the present wildtype allele. Adaptation from new mutations is thus characterized by a kind of invariance: natural selection chooses from the same spectrum of beneficial effects at a locus independent of the fitness rank of the present wild type. I show that these findings are reasonably robust to deviations from several assumptions. I further show that one can back calculate the mean size of new beneficial mutations from the observed mean size of fixed beneficial mutations.
ADAPTATION is a twostep process: (i) alleles having different effects on fitness arise by mutation and (ii) those alleles that improve fitness tend to increase in frequency by natural selection. A good part of classical population genetics focuses on the second step of this process, including calculation of the probability that natural selection will fix a new favorable mutation (Haldane 1927) and of the rate at which such a mutation will increase in frequency (Haldane 1924). But the first step in adaptation, the origination of new beneficial mutations, has been less well studied. We cannot say, for instance, if beneficial mutations of small effect are more common than those of large effect and, if so, how much more common.
This is unfortunate as a number of aspects of adaptive evolution depend on the distribution of beneficial fitness effects. For example, the mean increase in fitness that occurs during substitution of a beneficial mutation must depend on the spectrum of effects among new mutations presented to natural selection. Large jumps in fitness, for instance, are possible only if mutations of large favorable effect occur.
The most direct approach to finding the distribution of fitness effects among new beneficial mutations is empirical. But while possible in principle this approach has proved difficult in practice. There are two main problems: beneficial mutations are rare and beneficial mutations of small effect are difficult to detect. Because of this, study of experimental microbial populations would seem to provide the best hope of characterizing the distribution of beneficial effects: the combination of large population size and short generation time means that many beneficial mutations can be sampled in a short time (Lenski and Travisano 1994; Wichmanet al. 1999; Holder and Bull 2001). But even in microbes it has proved difficult to infer the distribution of beneficial effects. The main reason is that the beneficial mutations seen in most experiments are not a random sample of new mutations but rather those that have escaped stochastic loss. [Most beneficial mutations are accidentally lost when rare; the probability of loss depends on the size of the mutation’s fitness effect (Haldane 1927).] Imhof and Schlotterer (2001), for instance, recently attempted to characterize the distribution of fitness effects among new mutations in Escherichia coli. But because their experimental design [a variation on periodic selection (Atwoodet al. 1951)] depended on detection of favorable alleles that had reached appreciable frequencies, the distribution of effects observed was actually that for those “lucky” mutations that had escaped stochastic loss, not that for new mutations. Similarly, Rozen et al. (2002) recently characterized the distribution of fitness effects among fixed beneficial mutations in E. coli. This distribution is also not the same as among new beneficial mutations, as Rozen et al. (2002) emphasize. Indeed the distribution of fitness effects among fixed mutations in asexual microbes is distorted by both stochastic loss and clonal interference, the competitive exclusion of a mutation of small beneficial effect by one of larger beneficial effect in nonrecombining genomes or chromosome regions (Gerrish and Lenski 1998; Rozenet al. 2002). Although some experiments have attempted to assay beneficial mutations before they are subject to stochastic loss (Bullet al. 2000), these experiments are compromised by another of the problems noted above: they cannot detect beneficial mutations of small effect.
Given these difficulties, it seems worth asking if population genetic theory can provide any insight into the expected distribution of beneficial effects among new mutations. Gillespie (1983, 1984, 1991) suggested that the answer is yes. Using extreme value theory, he showed that the fitness gap between high fitness alleles is exponential. If, for instance, we consider a wildtype allele that can mutate to a single beneficial allele, the difference in fitness between these two alleles should be exponentially distributed. This result has been widely cited in the literature (Gerrish and Lenski 1998; Otto and Jones 2000; Wahl and Krakauer 2000; Orr 2002; Rozenet al. 2002). It has not, however, always been appreciated that Gillespie’s result concerns a special case. As Otto and Jones (2000) emphasize, Gillespie considers only the distribution of fitness differences between “adjacent” alleles, e.g., the difference in fitness between the secondbest allele (the present wild type) and the best allele (the beneficial mutant). Gillespie’s work thus provides the distribution of fitness effects among new beneficial mutations only if the present wild type can mutate to a single beneficial allele. We would obviously like to know the expected distribution of beneficial effects in the general case where the present wild type might mutate to two, or three, or four, etc., different beneficial alleles.
Here I derive this distribution. I show that it has two surprising properties: (i) the distribution of fitness effects among new beneficial mutations is always exponential and (ii) the distribution is invariant; i.e., it has the same mean regardless of the starting fitness rank of the wildtype allele. Our key assumption is that the starting wildtype allele has relatively high fitness.
THE MODEL AND RESULTS
The biological scenario: Following Gillespie’s (1983, 1984, 1991) “mutational landscape” model, I consider a population that was, until recently, well adapted to the environment. In particular, I consider a population that is essentially fixed for a wildtype sequence that was—until the recent environmental change—the fittest available at the gene. Following the environmental change, the wild type has dropped in fitness. The wildtype sequence can mutate to many alternative sequences. Of these, natural selection is essentially constrained to surveying those that differ from wild type by singlepoint mutations, as first pointed out by Maynard Smith (1970; see also Gillespie 1984). (Double mutations are too rare to be of much evolutionary significance; for the same reason, epistasis among new mutations can be ignored. Our results will hold for small genomes, as well as for single genes, as long as genomes are small enough that most mutations arise singly.) Given a gene that is L bp long, we thus need to consider only the m = 3L singlemutational step mutant sequences. For now, we assume that each of these m mutations arises with equal frequency, reflecting a constant and low pernucleotide mutation rate. This assumption is relaxed later.
Although we know little about the fitnesses of mutant sequences at any gene, it is clear that most of the m mutations will be less fit than the present wild type. This follows from two facts. First, environments are autocorrelated through time, making it unlikely that the best sequence today will be the worst tomorrow (Gillespie 1983). Second, a considerable fraction of mutations are unconditionally lethal or strongly deleterious, making it unlikely that the wild type would fall into the company of such alleles. Given our nearly complete ignorance of mutant fitnesses at most genes, Kimura (1983) and Gillespie (1983, 1984, 1991) suggested that one simply assume that the fitnesses of alternative alleles at a gene are drawn from some probability distribution. Importantly, in this article we do not need to specify this distribution. We assume only that, of the relevant m + 1 alleles (m mutations plus wild type), the wild type has relatively high fitness. The wild type can mutate, in other words, to a small number of beneficial sequences.
Distribution of beneficial effects: To find the distribution of fitness effects among those few mutations that are beneficial, we first rank the absolute fitnesses of the m mutant and one wildtype sequences: the fittest allele is given rank 1, the next fittest rank 2, and so on (Figure 1). The wildtype allele has rank i, where i is small. If a typical gene is L = 1000 bp long, then m = 3000 and i might range from, say, 2 to 25. The fitness gaps between adjacent alleles are labeled Δ_{1}, Δ_{2}, etc., as shown in Figure 1. Thus a mutation from wildtype allele i to favorable allele i  1 improves absolute fitness by ΔW = Δ_{i}_{1}, while a mutation from wildtype allele i to favorable allele 1 improves fitness by ΔW =Δ_{i}_{1} +... +Δ_{2} + Δ_{1}. The overall distribution, f(ΔWi), of fitness effects among beneficial mutations when starting from wildtype allele i is the mixed distribution formed by considering all such possibilities. In symbols,
Equation 1 shows that if we knew Δ_{1}, Δ_{2},..., Δ_{i}_{1} we would know the distribution of fitness effects among beneficial mutations. Although we have not specified the distribution of allelic fitnesses, we can, surprisingly, still say something about these fitness spacings. The reason, as Gillespie (1983) first saw, is that adaptation is confined to the righthand (fittest) tail of the distribution of allelic fitnesses. This fact lets us take advantage of certain limiting results from extreme value theory that describe the behavior of the top several draws from any reasonable distribution. Formally, the distributions we consider belong to the socalled Gumbel type, a broad category that includes most “ordinary” distributions like the normal, lognormal, exponential, gamma, Weibull, logistic, etc. Roughly speaking, this class excludes only exotic distributions like the Cauchy (which has no moments) and many (though not all) distributions that are bounded on the right (Gumbel 1958; Gillespie 1983, 1984, 1991). The appendix provides mathematical details. Although most extreme value theory holds asymptotically as the number of draws from a distribution approaches infinity, the fact that a wildtype sequence can mutate to a very large number of alternate sequences suggests that these asymptotic results should provide good approximations.
For our purposes, the most important of these limit theorems describes the spacings, Δ_{j}, between the top several draws, i.e., the fittest several alleles. Although for any particular wildtype sequence and set of mutants the Δ_{j}’s are constants, these extreme spacings will in general be random variables. Extreme value theory shows that these Δ_{j}’s are asymptotically independent exponentially distributed random variables regardless of the distribution of allelic fitnesses. Theory also shows that these spacings grow smaller as one moves toward the median allele as shown in Figure 1. In particular, E[Δ_{j}] = E[Δ_{1}]/j, where the constant E[Δ_{1}] depends on the form of the distribution of allelic fitnesses (Gumbel 1958; Weissman 1978).
Because we know the distribution of the top spacing, Δ_{1}, we also know the distribution of beneficial effects when i = 2 and only one favorable mutant is available. It is
But what if the wildtype allele has rank i = 3 and two favorable mutants are possible? If the population were to jump to the secondbest allele, we have f(ΔW i = 3, j = 2) = (2/E[Δ_{1}])exp(2ΔW/E[Δ_{1}]); if the population were to jump to the best allele, we have (from the convolution), f(ΔW i = 3, j = 1) = (2/E[Δ_{1}])[exp (ΔW/E[Δ_{1}])  exp(2ΔW/E[Δ_{1}])]. Substituting the previous two equations in (1), we find that the overall distribution of beneficial fitness effects is f(ΔW i = 3) = (1/E[Δ_{1}]) exp(ΔW/E[Δ_{1}]). This distribution is identical to that when starting at i = 2. Remarkably, this independence from fitness rank is a general result. This is proved in the next section where the momentgenerating function (mgf) of f(ΔW i) is derived.
The general result: To find the mgf of f(ΔW i) we first find the mgf for ΔW conditional on mutating to a favorable allele of rank j (where j < i as the mutation is beneficial). Because ΔW i, j =Δ_{j} +Δ_{j}_{+1} +... +Δ_{i}_{1} and each Δ_{n} is independent, the mgf for ΔW i, j equals the product of the mgf’s for the individual Δ_{n}. But the Δ_{n}’s are exponentially distributed with means E[Δ_{k}] = E[Δ_{1}]/k. The conditional mgf is thus
Figure 2 shows the results of exact computer simulations that test the accuracy of the above asymptotic theory. A gene of length L = 1000 bp was simulated. For each distribution of allelic fitnesses, the fitnesses of m = 3000 mutant alleles plus one wild type were randomly drawn from the distribution of allelic fitnesses, ranked in fitness, and the difference in fitness between the wild type and a randomly chosen beneficial mutation was recorded. Allelic fitnesses were assumed to be exponential, gamma, or halfnormal (see Figure 2 legend for parameter values). Simulations were begun with wildtype fitness ranks of i = 2, 10, or 25 (these different ranks translate into considerable differences in starting wildtype fitnesses given the distributions of allelic fitnesses used). Ten thousand replicates were performed for each set of conditions. Figure 2 shows that the theory nicely predicts the distribution of fitness effects among beneficial mutations regardless of the underlying distribution of allelic fitnesses. More important, the distribution of fitness effects among beneficial mutations is approximately invariant over a wide range of i, including those where it was unclear whether extreme value theory would hold (e.g., i = 25), although some deviations appear at large i in the halfnormal case. Because these deviations grow as i increases, it would seem unwise to extrapolate our results to much larger i (see Figure 2 legend).
It might seem that our results may simply reflect the memoryless property of exponential distributions. Many familiar distributions have an “exponential tail” in the sense that, as x gets large, (1  F(x + y))/(1  F(x)) → exp(cy), where c is a constant; in words, the probability of an increase of size y falls off exponentially and is independent of the precise “starting point,” x. The memoryless property of exponential tails cannot, however, fully explain our results. Many distributions of the Gumbel type do not show such tail behavior, e.g., normal or lognormal distributions and those that are bounded on the right. Nonetheless, these distributions still have independent exponential extreme spacings and still give rise to an exponential distribution of fitness effects among new beneficial mutations. Our results hold asymptotically for all distributions in the domain of attraction of the Gumbel extreme value distribution whether or not they have “exponential tails” (see the appendix).
Robustness of results: The above results are asymptotically independent of the shape of the distribution of allelic fitnesses (so long as it is of the Gumbel type) and i (so long as it is small). The above results are also robust to strong selection; indeed no weak selection approximations have been made.
Two assumptions, however, are potentially important. First, I assumed that each of the m mutations appears with equal frequency. It is easy to show that unequal mutation rates do not affect the above findings so long as all alleles are equally likely to have a given fitness rank; i.e., mutationally common alleles are no more or less likely to have a given fitness rank than are mutationally rare alleles. Formally, the chance that the next mutation has rank j is
Second, following Gillespie (1983, 1984, 1991), I assumed that the distribution of allelic fitnesses is well behaved: it is a simple monotonically decreasing or unimodal distribution (e.g., exponential, gamma, halfnormal). But the actual distribution of allelic fitnesses at a locus might be a complicated mixture of several underlying distributions. To test the robustness of the analytic results, I used computer simulations to find the distribution of beneficial fitness effects when sampling from various “ugly” mixture distributions of allelic fitnesses. Figure 3 shows two such mixture distributions. One is a mixture of two underlying distributions and the other is a mixture of four underlying distributions. (In both cases, normal distributions contributed to the mixture distribution as the normal represents a near worstcase scenario, which converges to the extreme value distribution very slowly (Gumbel 1958).) Figure 4 shows that the distribution of fitness effects among beneficial mutations remains roughly exponential in both cases. The distribution of fitness effects is also reasonably insensitive to starting i, which again ranged from i = 2 to i = 25. Figure 4 also shows, however, that as the tail of the mixture distribution grows lumpier, the distribution of beneficial effects becomes less well behaved. Our analytic results are therefore reasonably, but not indefinitely, robust to mixture distributions of allelic fitnesses.
DISCUSSION
Following Gillespie (1983, 1984, 1991), I have assumed that allelic fitnesses are drawn from some (unknown) probability distribution and that the present wildtype allele, while no longer the fittest sequence, is near the top in fitness. Under these assumptions, I have shown that the distribution of fitness effects (ΔW) among new beneficial mutations is exponential. More surprisingly, the distribution of beneficial effects shows an invariance property: it remains the same regardless of the fitness rank (and thus fitness) of the wildtype allele. Natural selection will, therefore, choose from the same spectrum of mutational effects whether adaptation starts from the secondbest possible allele (i = 2) or one that is considerably worse (e.g., i = 10).
These results depend on robust limit theorems from extreme value theory and so are quite general. They are independent of the distribution of allelic fitness (so long as it is of the Gumbel type), starting wildtype fitness (so long as it is high), strength of selection, and heterogeneity in mutation rates across sites. Although our results rest on asymptotic theory and so must be viewed as approximations (especially as different parent distributions converge on the extreme value distribution at different rates), computer simulations suggest that they are good approximations. Our results also hold in both sexual and asexual species and recombining and nonrecombining chromosome regions. There would seem to be good reason, then, for thinking that the distribution of beneficial fitness effects among new mutations at a locus might be generally approximately exponential and invariant. (There could, of course, be exceptions. One predicted by the present theory involves any locus at which the wild type is of very low fitness. Extreme value theory does not hold here. Another is where the distribution of allelic fitnesses has a very lumpy tail; see the above simulations.)
Though counterintuitive, the invariance property among beneficial mutations can be explained heuristically. If adaptation starts from a highquality wildtype allele (i = 2), the jump to the best allele usually involves mediumsized fitness increases (Δ_{1}; see Figure 1). But if adaptation starts from a lowerquality allele (i = 3), jumps to better alleles involve some fitness increases that are usually smaller than before (Δ_{2}) and an equal number that are usually larger than before (Δ_{1} +Δ_{2}). On average these balance and the mean fitness increase is unchanged. This argument generalizes for any starting wildtype fitness rank i, so long as it is small.
It is important to note that our results concern fitness increases, not selection coefficients. Selection coefficients are fitness increases normalized by wildtype fitness: s =ΔW/W_{+}, where W_{+} is the fitness of the wildtype allele. Because for any given i, ΔW and W_{+} are both random variables, it is easy to show that selection coefficients do not enjoy the above invariance property. Instead the mean selection coefficient among beneficial mutations is E[s] = E[Δ_{1}/(W_{1} Δ_{1} ... Δ_{i}_{1})], which shrinks slightly with smaller i. Numerical work shows, however, that the distribution of s remains roughly exponential over small i (not shown; see also Rozenet al. 2002).
The above theory, when combined with previous work, allows us to back calculate the mean fitness effect of new beneficial mutations from the mean fitness effect of fixed beneficial mutations, which are much more easily assayed in microbial experimental evolution work. Because large beneficial mutations have a greater chance of going to fixation than do small ones, the mean fitness increase among fixed beneficial mutations will obviously exceed (or at least equal) that among new beneficial mutations. Orr (2002) showed that, under the same assumptions as made here, the mean increase in fitness among fixed beneficial mutations in sexuals is E[ΔW_{fixed}] = 2(i  1)E[Δ_{1}]/i. This quantity ranges between E[Δ_{1}] and 2E[Δ_{1}]. Because the present theory shows that E[Δ_{1}] asymptotically equals the mean fitness effect of new beneficial mutations, it immediately follows that
Although theoretical population genetics has historically focused on neutral and deleterious mutations, recent theory has turned to adaptation (Gerrish and Lenski 1998; Hartl and Taubes 1998; Orr 1998, 2000, 2002, 2003; Gerrish 2001). This body of theory now lets us describe how a uniform rate of mutation to various mutant sequences gets transformed under fairly broad conditions into an exponential distribution of beneficial fitness effects of mean E[ΔW_{new}] = E[Δ_{1}]. In sexuals this distribution then gets transformed by probabilities of fixation into one of mean E[ΔW_{fixed}] = 2(i  1)E[Δ_{1}]/i (Orr 2002). This distribution, which characterizes a single step in adaptation, in turn gets transformed during the stepwise approach to a fixed optimum into a 1998, 2002; the former article considered phenotypic effects and the latter selection coefficients; in both cases, however, ΔW is also roughly exponential). Thus both the distribution of beneficial effects among new mutations and the distribution of effects among the mutations ultimately fixed should be roughly exponential, at least when adaptation uses new mutations and approaches a constant optimum. It will obviously be of some importance to determine if similar patterns characterize adaptation when evolution proceeds from the standing genetic variation and/or approaches a moving optimum.
APPENDIX
Most “ordinary” distributions belong to the Gumbel type (also known as type III). Here I briefly review the conditions for a distribution to belong to this type. My discussion is based loosely on that of Leadbetter et al. (1980) and de Haan (1970).
Consider a parent distribution with probability density function (pdf) f(x) and cumulative distribution function (cdf) F(x). If we randomly draw a very large number, n, of values from this distribution, record the maximum value, and repeat this process many times, we will find that the distribution of the maximum (or of a linear function of the maximum) often tends to a limiting distribution, the socalled extreme value distribution. In reality, there are three extreme value distributions. “Ordinary” distributions like the normal, lognormal, exponential, gamma, etc., are in the domain of attraction of the Gumbel extreme value distribution; this is often casually referred to as the extreme value distribution. The cdf of the Gumbel distribution is Λ(y) = exp(exp(y)), where y is a linear function of the original random variable x. Many (though not all) bounded distributions are in the domain of attraction of another extreme value distribution, while exotic distributions like the Cauchy (whose moments are undefined) are in the domain of attraction of a third extreme value distribution. All of these extreme value distributions hold asymptotically as n → ∞.
Parent distributions in the domain of attraction of the Gumbel distribution may have unbounded or bounded tails. The rightmost endpoint of the parent distribution is denoted x_{F}, where x_{F} ≤ ∞ and f(x) = 0 for x > x_{F}. If f(x) has a negative derivative over an interval (x_{0}, x_{F}), a sufficient condition for f(x) to be in the domain of attraction of the Gumbel extreme value distribution is that
The necessary and sufficient condition for f(x) to be in the domain of attraction of the Gumbel extreme value distribution has also been found. It is
If the maximum of a distribution converges to a particular extreme value distribution, the second and third, etc., largestorder statistics will converge to an asymptotic distribution of related functional form; i.e., these order statistics belong to the same type as the maximum. Weissman (1978) showed that all parent distributions in the domain of attraction of the Gumbel extreme value distribution have spacings between extreme order statistics that are asymptotically independent exponential random variables that behave as described in the text.
Acknowledgments
I thank N. Barton, A. Betancourt, P. Gerrish, J. Gillespie, J. P. Masly, D. Presgraves, M. Turelli, and two anonymous reviewers for helpful discussions or comments. I especially thank L. de Haan, I. Weissman, and D. Zelterman for helpful discussions of extreme value theory. This work was supported by National Institutes of Health grant 2R01 G5193206A1 and by The David and Lucile Packard Foundation.
Footnotes

Communicating editor: J. B. Walsh
 Received October 10, 2002.
 Accepted January 5, 2003.
 Copyright © 2003 by the Genetics Society of America