## Abstract

This note discusses a minor mathematical error and a problematic mathematical assumption in Luria and Delbrück's (1943) classic article on fluctuation analysis. In addition to suggesting remedial measures, the note provides information on the latest development of techniques for estimating mutation rates using data from fluctuation experiments.

THE fluctuation test protocol devised by Luria and Delbrück (1943) still serves as the basis for estimating microbial mutation rates today, although later developments have resulted in much improved methods. Among the later contributions that enhanced our ability to analyze fluctuation experiments are those made by Lea and Coulson (1949), Armitage (1952), Crump and Hoel (1974), Mandelbrot (1974), Koch (1982), Stewart *et al.* (1990), Ma *et al.* (1992), Jones *et al.* (1994), and many others found in a recent review (Zheng 1999). Rosche and Foster's (2000) critical comparison of the then existing methods is a useful guide for biologists. One goal of this note is to notify the reader of the latest developments that can further help biologists improve their ability to measure mutation rates. Another goal is to discuss a minor mathematical error and a problematic mathematical assumption in Luria and Delbrück's (1943) article that have caused lingering confusion. Previous attempts to clarify the confusion were scarce and to a large extent failed to resolve some relevant practical issues. As a result, the genetics literature is increasingly fraught with mutation rates that were computed using either incorrect or unreliable methods. It appears helpful that the minor error and the problematic assumption are explained and remedial measures are provided. I begin with a paradox that has puzzled many.

In a fluctuation experiment, each of *n* parallel cultures is seeded at time zero with *N*_{0} nonmutant cells for incubation. At a later time *T* each culture has about *N _{T}* nonmutant cells and the contents of each culture are plated to facilitate counting of mutants existing at time

*T*in the

*n*cultures. This process results in experimental data in the form of

*X*

_{1},

*X*

_{2},…,

*X*, the numbers of mutants existing in the

_{n}*n*cultures immediately before plating. If

*z*of the

*n*cultures still remain devoid of mutant cells at time

*T*, Luria and Delbrück's (1943)

*P*

_{0}method estimates the mutation rate by(1)with . Although Luria and Delbrück did not give the above equation, their numerical example on page 507 clearly indicates that they used (1) to estimate mutation rates defined as “mutations per bacterium per division cycle.” This definition of mutation rates has been widely accepted, and throughout this note the term “mutation rate” is used in that sense. In other words, a mutation rate is the probability that a cell undergoes a mutation during the cell's life cycle. Using the same definition, Lederberg (1951, p. 99) argued that mutation rates should be estimated by(2)Lederberg's reasoning runs as follows. In each culture

*N*−

_{T}*N*

_{0}cellular divisions have happened. If μ

_{0}is the mutation rate, the probability that a culture is devoid of mutants after

*N*−

_{T}*N*

_{0}cellular divisions isEquating

*P*

_{0}with yields an estimator of μ

_{0}in the form of (2). Thus Equations 1 and 2, differing by a factor of log 2, aim at estimating the same quantity.

This paradox has led some to seek justifications of (1) (*e.g.*, Hayes 1968, p. 194, and Drake 1970, p. 49) and others to cast doubt on it (*e.g.*, Lea and Coulson 1949, p. 266, and Kondo 1972). No consensus has emerged. A helpful approach to clarifying this issue is to present an argument that not only reinforces the validity of (2) but also highlights what was overlooked by Luria and Delbrück in arriving at (1). Note that Luria and Delbrück used average doubling time divided by log 2 to measure time, rendering their derivation of (1) unnecessarily difficult to understand. I use clock time by introducing an explicit cell growth parameter β. Specifically, the nonmutant population size at time *t*, denoted by *N*(*t*), is modeled by an exponential function *N*(*t*) = *N*_{0}*e*^{βt}. Thus, occurrence of mutation is assumed to be governed by a Poisson process having intensity function μ*N*_{0}*e*^{βt}. Because the probability of a mutation occurring in a small time interval (*t*, *t* + Δ*t*) is approximately μ*N*_{0}*e*^{βt}Δ*t*, the parameter μ is often called the probability of mutation per cell per unit time. However, except for time points *t* = 0 and *t* = *t _{k}* with

*t*= β

_{k}^{−1}log(

*k*/

*N*

_{0}) for

*k*= 1, 2…,

*N*(

*t*) =

*N*

_{0}

*e*

^{βt}does not represent the actual population size at time

*t*, because

*N*(

*t*) is a positive integer if and only if

*t*= 0 or

*t*=

*t*for some

_{k}*k*. A literal interpretation of μ as “mutation per cell per unit time” out of the intended context can lead to unexpected results, as the following analysis demonstrates.

Consider a time interval (*t _{k}*,

*t*

_{k}_{+1}] for an arbitrary positive integer

*k*. As hinted above,

*t*

_{k}_{+1}−

*t*can be viewed as an interdivision time under the assumption

_{k}*N*(

*t*) =

*N*

_{0}

*e*

^{βt}. Extending an argument of Kondo (1972), I express the probability of one or more mutations occurring in that time interval is(3)Notably, this probability is independent of

*k*, which suggests viewing μ

_{β}= μ/β as the probability of one or more mutations occurring between two consecutive cellular divisions. Since μ

_{β}is presumably small, the probability of two or more mutations occurring in (

*t*,

_{k}*t*

_{k}_{+1}] is negligible compared to the probability of exactly one mutation happening in that interval. Therefore, μ

_{β}can be regarded as the probability of a mutation per cell division or as the mutation rate. Note that the expected number of mutations accumulated by time

*T*is(4)Let

*p*

_{0}(

*T*) be the probability that no mutation occurs in a culture by time

*T*. In light of the Poisson mutation model, one has

*p*

_{0}(

*T*) =

*e*

^{−m(T)}. Therefore, it follows from (4) that(5)Replacing

*p*

_{0}(

*T*) with leads to an estimator identical to that given in (2). On the other hand, Luria and Delbrück's reasoning leading to (1) was based on their interpretation of μ as “a fixed small chance per unit time for each bacterium to undergo a mutation” (Luria and Delbrück 1943, p. 494). Because a cell's life cycle is considered to be (log 2)/β under the assumption

*N*(

*t*) =

*N*

_{0}

*e*

^{βt},(6)seems to be the probability of mutation per cell division, namely, the mutation rate. However, (6) is merely a literal interpretation of μ as the probability of mutation per cell per unit time; the precise meaning of μ can be grasped only within the context of the important assumption that occurrence of mutation is governed by an inhomogeneous Poisson process having an “instantaneous” rate μ

*N*

_{0}

*e*

^{βt}at time

*t*. Because (6) deviates from the intended definition of μ, the aforementioned paradox results.

The above discussion indicates that, from a theoretical point of view, one should choose μ_{β} as a definition of the mutation rate and use (2) instead of (1) as an estimator thereof. However, the discussion does not answer the question of whether μ_{β} agrees with the definition of a mutation rate from a practical point of view. To address this issue I ran several simulations, one of which I now report. I first simulated the numbers of mutants for 30,000 cultures. To maintain a degree of independence between my simulation and the mathematical model that gives rise to (5), I used a discrete-time cell division model based on the five assumptions given by Angerer (2001, p. 149). Specifically, each culture starts with one nonmutant cell. At each step one cell from the culture is chosen to divide. If the culture already has *x* mutants and *y* nonmutants, then the probability that a mutant is chosen to divide is *x*/(*x* + *y*), and the probability that a nonmutant is chosen to divide is *y*/(*x* + *y*). When a mutant divides, it splits into two mutant daughter cells; when a nonmutant divides, it splits into one mutant and one nonmutant with probability *p*, or it splits into two nonmutants with probability 1 − *p*. (Clearly, *p* thus defined agrees with the definition of a mutation rate as is commonly understood.) For each culture, this procedure is repeated until *x* + *y* = *N _{T}*. In the simulation I chose

*p*= 5 × 10

^{−8}and

*N*= 10

_{T}^{8}. I then considered the first 30 cultures as coming from the first experiment, the next 30 cultures as coming from the second experiment, and so on. For each simulated experiment, I used SALVADOR (Zheng 2002, 2005) to find the maximum-likelihood estimate of

*m*(

*T*). Finally, I used the relation μ

_{β}≈

*m*(

*T*)/

*N*suggested by (5) to compute mutation rates. The average of these 1000 estimated mutation rates is 5.036 × 10

_{T}^{−8}and the median is 4.988 × 10

^{−8}, indicating that μ

_{β}, and hence not (log 2)μ

_{β}, coincides with

*p*. The distribution of these 1000 estimates of the mutation rate is summarized in Figure 1.

Another issue that has also caused lingering confusion is the use of the sample mean in estimating mutation rates. Because the sample mean has too large a variance to be useful in statistical inference, Luria and Delbrück introduced the concept of a “likely average” to alleviate this problem. Theoretically, the expected number of mutants in a culture can be found by solving Luria and Delbrück's (1943) Equation 6,(7)with the initial condition ρ(0) = 0. From standard differential equation theory it follows that(8)Thus, the expected number of mutants in a culture at time *T* is(9)Luria and Delbrück reasoned that prior to certain time *t*_{0} mutation is unlikely to occur in an experiment. Therefore, they changed the lower limit of integration in (8) from zero to some *t*_{0} > 0, giving their Equation 6a,(10)Because Luria and Delbrück chose *t*_{0} to be the epoch at which one would expect the first mutation to occur among the *n* cultures, they replaced *T* in (4) with *t*_{0} to yield(11)Luria and Delbrück further assumed that ; as a result, (11) reduces to(12)On the other hand, rearranging gives β(*T* − *t*_{0}) = log(*N _{T}*/

*N*(

*t*

_{0})). Because

*N*(

*t*

_{0}) = 1/(

*n*μ

_{β}) from (12), it then follows that(13)which is Luria and Delbrück's Equation 7. Combining (13) and (10) yields an expression for the likely average equivalent to Luria and Delbrück's Equation 8:(14)By equating the above likely average with a sample mean one can estimate μ

_{β}by numerically solving(15)Lederberg (1951, p. 99) observed that this method “offers certain short-term advantages,” but did not satisfactorily solve the intrinsic problem due to the high variability of . Recently Lederberg's caution began to be appreciated, and (15) is no longer in common use.

But the idea of a likely average has become entrenched in the literature due to a popular modification of (15). Setting *n* = 1 and replacing the sample mean with the sample median yields an estimating equation(16)which is often called Drake's formula. Thus, Drake's formula is based on the concept of a modified likely average—the quantity *t*_{0} is increased to the epoch at which one would expect the first mutation to occur in a single culture. As Rosche and Foster's (2000) simulations indicate, Drake's formula is good only when the expected number of mutations per culture, *m*(*T*), is ∼30, a rare experimental scenario. The inadequate performance of (16) casts doubt on the usefulness of the concept of a likely average. In assessing the usefulness of a likely average, one might note that (10) is a solution of (7) satisfying the initial condition ρ(*t*_{0}) = 0. Therefore, the likely average in (14) can be regarded as the mean number of mutants in a culture under the assumption that cells were prevented from mutating before an arbitrarily chosen time *t*_{0}. The validity of this assumption seems problematic. The concept of a “likely average” has neither theoretical nor empirical bases, and hence its usefulness in estimating mutation rates is dubious.

Now I suggest the following guidelines for mutation rate estimation. First, in the context of fluctuation experiments, the term mutation rate should be reserved for μ_{β}, because this quantity agrees with the accepted definition of a mutation rate as the probability that a cell undergoes a mutation during its life cycle. The parameter μ is a necessary mathematical device, but the term mutation rate per cell per unit time can be avoided in most biological contexts to avoid confusion. Second, published mutation rates computed using (1) should be divided by log 2; the *P*_{0} method is still needed when the number of mutants in a culture is difficult to ascertain, but the presence or absence of mutants in a culture can be determined. In applying the *P*_{0} method one should use (2). Third, the concept of a likely average is obsolete, and so are methods based on that concept, *e.g.*, Equations 15 and 16. If a published mutation rate was computed using (16), one can use the same equation to recover the sample median (provided *N _{T}* is known); for comparison purposes two additional estimates of the mutation rate can then be obtained by using Lea and Coulson's (1949) method of the median and by applying Equation 6 of Jones

*et al.*(1994),to estimate

*m*(

*T*) from . The mutation rate μ

_{β}can then be extracted from the relation μ

_{β}=

*m*(

*T*)/(

*N*−

_{T}*N*

_{0}) as given in (5). Finally, as Rosche and Foster (2000) emphasized, if all

*n*observations

*X*

_{1},…,

*X*from a fluctuation experiment are available, the best approach for estimating a mutation rate is to use the maximum-likelihood method to estimate

_{n}*m*(

*T*). Use of the maximum-likelihood method was not common in the past, partly due to lack of convenient and efficient computer software written specifically for fluctuation analysis. This situation has been ameliorated by the appearance of SALVADOR, which includes most of the existing methods for fluctuation analysis. In particular, SALVADOR provides methods for analyzing experiments where mutants and nonmutants grow at different rates. Moreover, recent theoretical developments (Zheng 2002, 2005) have made it possible to construct asymptotic confidence intervals for mutation rates. These interval estimation methods can be readily applied via SALVADOR, which is available at http://library.wolfram.com/infocenter/MathSource/5556.

## Acknowledgments

I am much indebted to two anonymous reviewers whose detailed comments substantially improved the presentation.

## Footnotes

Communicating editor: J. Wakeley

- Received September 1, 2004.
- Accepted July 12, 2005.

- Copyright © 2005 by the Genetics Society of America