## Abstract

In previous work by M. E. Jones and colleagues, it was shown that mutation rate estimates can be improved and corresponding confidence intervals tightened by following a very easy modification of the standard fluctuation assay: cultures are grown to a larger-than-usual final density, and mutants are screened for in only a fraction of the culture. Surprisingly, this very promising development has received limited attention, perhaps because there has been no efficient way to generate the predicted mutant distribution to obtain non-moment-based estimates of the mutation rate. Here, the improved fluctuation assay discovered by Jones and colleagues is made amenable to quantile-based, likelihood, and other Bayesian methods by a simple recursion formula that efficiently generates the entire mutant distribution after growth and dilution. This formula makes possible a further protocol improvement: grow cultures as large as is experimentally possible and severely dilute before plating to obtain easily countable numbers of mutants. A preliminary look at likelihood surfaces suggests that this easy protocol adjustment gives markedly improved mutation rate estimates and confidence intervals.

A common method for estimating mutation rates is the “fluctuation assay,” in which several bacterial cultures are grown from small, presumably mutant-free, inoculums to high-density cultures and subsequently screened for mutants by plating on selective media (Rosche and Foster 2000; Pope *et al.* 2008). On selective media, only selected mutants are able to form colonies, and the numbers of mutants counted on plates are an indicator of the mutation rate. To make an estimate of the mutation rate from these numbers, *a priori* knowledge of their distribution is required. The fundamental assumption upon which this distribution is derived is that no Lamarckian-like processes exist: mutations are assumed to occur spontaneously during cell division in a way that is completely independent of any detriment or benefit that the mutations might subsequently incur. Such mutations can occur at any time during the growth of a culture. If a mutation happens to occur early in the growth of a culture, it will leave a large number of mutant descendants in the culture at the time of plating. The nonnegligible probability of a mutation occurring early thus translates to nonnegligible probabilities in the right tail of the mutant distribution. The resulting heavy-tailed mutant distribution was first proposed by Luria and Delbrück (1943) and has since been dubbed the “Luria–Delbrück distribution” (LDD) in their honor. Its first rigorous derivation was given by Lea and Coulson (1949), and many subsequent derivations account for different biological and experimental details (Armitage 1952; Bailey 1964; Mandelbrot 1974; Bartlett 1978; Stewart *et al.* 1990; Stewart 1991; Pakes 1993; Zheng 1999; Kepler and Oprea 2001; Oprea and Kepler 2001; Dewanji *et al.* 2005).

Each of these derivations culminates in a version of the LDD expressed in probability-generating function (PGF) form. The great utility of PGFs is that (1) distributions can be easier to derive in PGF form, (2) distributions can be expressed compactly in PGF form, even in cases where there is no way to write down a general (nonderivative) expression for the individual probabilities of the distribution itself, and (3) from a distribution expressed in PGF form, the zero class and the moments of the distribution are immediate. This last feature suggests possible mutation rate estimators. The zero-class estimator is robust to different assumptions about growth rates, but its statistical power is mediocre and its use is restricted to cases in which the zero class exists but is not too big. Moment-based estimators can be very inaccurate as a result of the LDD's heavy tail (Armitage 1952, 1953); in fact, the simplest form of the distribution—due to Lea and Coulson (1949)—has no finite moments.

With limited prospects for the utility of moment-based methods, researchers naturally turned to the quantiles for information about the distribution and for obtaining mutation rate estimates. Quantile-based methods, however, require that distribution probabilities be extracted from the governing PGF. Until relatively recently, algorithms for extracting distribution probabilities have been quite complex, and their utility was therefore chiefly for the construction of tables from which researchers could look up quantile-based mutation rate estimates. The most widely used of these methods was the Lea and Coulson “median method” (Lea and Coulson 1949).

Likelihood and other Bayesian methods of mutation rate estimation were first made accessible to experimentalists by Stewart's discovery of a relatively efficient algorithm for generating the probabilities of the LDD (Stewart *et al.* 1990). The subsequent discovery of an even more efficient algorithm—a simple recursion formula (Ma *et al.* 1991, 1992; Sarkar *et al.* 1992) after Gurland (1958)—made likelihood methods routine, and consequently there has been increasing preference for these superior methods of mutation rate estimation (Rosche and Foster 2000). Stewart (1994) performed a systematic assessment of these superior methods and derived a formula for calculating confidence intervals.

Analysis of mutant-count data can sometimes be complicated by experimental artifacts that make a certain fraction of mutants unable to form observable colonies. Such artifacts can be especially pronounced in assays that use mammalian cells (Kendal and Frost 1988). The fraction of plated mutants that do ultimately form observable colonies was originally called the “plating efficiency,” but it is mathematically equivalent to sampling the culture by any means, such as intentionally plating only a fraction of the final culture by either direct aliquot or dilution. To account for such sampling, Jones (1993, 1994) and Stewart (1991) both model the LDD as an irreducible compound distribution: the total number of mutations that occur during the growth of a culture is a Poisson-distributed random variable, and the total number of mutant colonies that derive from each mutation event is an independent random variable whose distribution is the “clone-size” distribution derived by Lea and Coulson. Jones and Stewart both reworked the clone-size distribution with a sampling step added: they derived the probability that a mutation occurring during the growth of a culture gives rise to *k* visible colonies when only a fraction of mutants form visible colonies. To generate the probabilities of total numbers of mutant colonies on plates after growth and sampling, they both outlined different programming algorithms (Stewart *et al.* 1990; Stewart 1991; Jones 1993; Crane *et al.* 1996) to compose the primary Poisson distribution for numbers of mutations occurring during growth with their reworked clone-size distributions that take preplating sampling into account. Jones' algorithm is as follows.

We let denote the probability that a single new mutation gives rise to mutant colonies in the plated culture after dilution (the Jones clone-size distribution). We then define the conditional probability, to be the probability that or fewer mutant colonies are observed, given that new mutations occurred during the growth of the culture. These probabilities are calculated from iteration of the expression with initial condition . When this is done, then we can compute the desired probability, , that colonies are observed in the plated culture, given that is the mean number of new mutations occurring during growth and is the dilution factor that precedes plating (or plating efficiency), as follows: . One cannot sum numerically to infinity, and this sum must therefore be truncated. Jones truncates the sum at 100, which is adequate for values , but thus limits applicability of his tabulated median method results to protocols for which plating efficiency is greater than ∼5%. A more elegant solution to the problem of infinite summation appears in Ma *et al.* (1992) and Jones *et al.* (1999).

The above numerical procedure for obtaining the probabilities of the mutant distribution, while more efficient than some, was still unable to accommodate likelihood calculations in an efficient way, especially when the plating efficiency was very small. Nevertheless, these methods paved the way for a rather striking discovery: Jones and colleagues (Crane *et al.* 1996) observed that mutation rate estimates and corresponding confidence intervals could actually be improved by adding a sampling (or dilution) step. If cultures were grown to larger-than-usual final densities and then diluted prior to plating, mutation rate estimates could be improved and corresponding confidence intervals tightened. This development (henceforth called the “Jones protocol”) was significant because it had the potential to introduce a new standard for reliability and resolution with which mutation rates could be reported. Despite the promise of better estimates, however, Jones' improvement on the standard fluctuation assay has not become standard practice. The gains made by the Jones protocol are countered by cumbersome numerical calculations and hence its reliance on a somewhat limited table-based median method for practical mutation rate estimation. On the other hand, the inferiority of the standard fluctuation protocol is compensated for by the superiority of likelihood and other Bayesian methods available to it for practical mutation rate estimation. The pros and cons of these two protocols thus achieved a rough balance and, faced with a choice between the two, many researchers have opted for the more fashionable likelihood approaches in conjunction with the standard fluctuation protocol.

It is hoped that the simple recursion formula derived below and plotted in Figure 1 will allow experimentalists to have their cake and eat it too, by allowing them to follow the Jones protocol for improved mutation rate estimates (Crane *et al.* 1996) and use maximum likelihood or other Bayesian methods (Asteris and Sarkar 1996) for practical mutation rate estimation. The result is a rather spectacular narrowing of likelihood peaks and increased accuracy of their modes (Figure 2). While several sample likelihood peaks are shown in Figure 2, the purpose of this short note is not to propose a detailed and evaluated methodology for employing the Jones protocol but simply to present the recursion formula that gives the mutant distribution.

## DERIVATION OF THE FORMULA

The PGF for numbers of mutants after growth of the cultures in a fluctuation assay (the LDD) as derived by Lea and Coulson (1949) iswhere , is the mutation rate, and is the final population size of the cultures. Of the mutants present in the cultures after growth, only a fraction will give rise to observable mutant colonies; *i.e.*, is the plating efficiency. For each mutant in a culture just prior to plating, therefore, the stochastic sampling process due to plating (or, equivalently, due to preplating dilution) is a Bernoulli process, whose PGF isNumbers of mutants after growth and sampling have a distribution whose PGF is

The Lea and Coulson PGF, , has no finite moments, and it is no surprise therefore that also has no finite moments. The Lea and Coulson PGF may be rewritten asand the PGF for growth and sampling is thereforeor, expanding further,Rearranging and collecting coefficients of like powers of givesThe coefficient of can be reduced towhere denotes the hypergeometric function as defined in Abramowitz and Stegun (1965) by their Equation 15.1.1. The PGF can now be rewritten aswhere the are defined above.

With now in the appropriate form, we can appeal to Lemma 2 in Zheng (1999), after Ma *et al.* (1991) and Gurland (1958), to derive the probabilities of the distribution, which reduce to the recursionand(1)where

Unfortunately, subroutines for computing the hypergeometric function can be computationally intensive and they can become unstable for very small values of , *i.e.*, for severe dilutions. To obtain a more useful expression, I expand the hypergeometric function about and obtain an extremely good approximation (Figure 1) for the case of severe dilutions (small ; see Figure 3):(2)The tail probabilities of this distribution are derived by inspection of the composition,where the are the probabilities of the Lea and Coulson distribution. For large , we have (Pakes 1993), and as before; the *n*th term of the composition may thus be expanded as follows:

Writing out the probabilities of the composed distributions in the LDD tail givesRearranging and collecting coefficients of like powers of gives , where

Thus, not surprisingly, tail behavior is not changed by sampling (the relation still holds). Absolute tail probabilities, however, are reduced by the dilution factor, ; this reduction implies a corresponding increase in absolute probabilities elsewhere, *i.e.*, in the peak of the distribution. This observation may provide insight to better understand why mutation rate estimates are better and give tighter confidence intervals when the Jones protocol is used (Crane *et al.* 1996).

## DISCUSSION

The formula derived here makes the Jones protocol amenable to likelihood methods, which have the advantage that they use all the available data in an efficient manner. A disadvantage of likelihood methods, however, is that they are very “parametric” and their usefulness thus depends heavily on the correctness of the underlying model used to generate the distribution. Quantile and zero-class methods (Kendal and Frost 1988; Zheng 1999; Rosche and Foster 2000), on the other hand, are much less parametric and are thus more robust to departures from the underlying model. Put differently, quantile and zero-class methods have some advantage when there is some uncertainty about the underlying model or when there are known simplifying assumptions that can introduce error. The formula presented here derives from the simplest model for the LDD (Lea and Coulson 1949), which makes a number of simplifying assumptions, such as exponentially distributed times between replication events (a Markovian birth process) (Jones *et al.* 1999; Kepler and Oprea 2001), probabilities expressed in infinite (nontruncated) series thereby assigning nonzero (albeit miniscule) probabilities to unrealistically large mutant numbers (Armitage 1952; Bailey 1964; Stewart *et al.* 1990; Ma *et al.* 1992; Zheng 1999), and no difference in growth rates between wild-type and mutant subpopulations (Jones 1994; Jaeger and Sarkar 1995; Zheng 1999). In light of these simplifying assumptions, whose impact can sometimes be significant, it might ultimately be desirable to use the less accurate but more robust quantile or zero-class methods to complement the more accurate but less robust likelihood methods. The zero-class, the median, and other quantiles can be easily computed from the formula presented here. Additionally, Jones and colleagues (Jones *et al.* 1994) derive analytical expressions for zero-class and median estimators that are modified to include dilution. Evaluating the use of quantile and zero-class methods to complement likelihood methods is work in progress.

In the previous work by Jones and colleagues (Crane *et al.* 1996), numerical considerations put limits on the size of the final culture densities and hence the severity of the subsequent preplating dilution; *i.e.*, they put limits on how small can be. The approximation (2) derived above, on the other hand, works best when is small (Figure 3). This fact suggests an improvement on the Jones protocol: grow cultures as large as is experimentally possible and dilute the final cultures accordingly prior to plating to obtain easily countable numbers of mutants on the plates. This can only further reduce the impact of the small- assumption used to obtain the more useful approximate formula derived here, thereby increasing its accuracy, and further tighten corresponding confidence intervals due to the effect described by Jones and colleagues (Crane *et al.* 1996). In Figure 2, this modification is referred to as the “extended Jones protocol.” A systematic evaluation of different experimental protocols and the accuracy of mutation rate estimates to be expected is the subject of this article's sequel.

## Acknowledgments

This work was inspired by a data set sent to me by Dominique Schneider. I thank Paul Sniegowski for helpful discussions, and I thank M. E. Jones and an anonymous reviewer for helpful comments and suggestions. I was supported by National Institutes of Health grant R01 GM079483-01A2 and European Commission grant MEXT-CT-2004-14338.

## Footnotes

Communicating editor: F. W. Stahl

- Received May 20, 2008.
- Accepted September 14, 2008.

- Copyright © 2008 by the Genetics Society of America