## Abstract

Although mutation rates are a key determinant of the rate of evolution they are difficult to measure precisely and global mutations rates (mutations per genome per generation) are often extrapolated from the per-base-pair mutation rate assuming that mutation rate is uniform across the genome. Using budding yeast, we describe an improved method for the accurate calculation of mutation rates based on the fluctuation assay. Our analysis suggests that the per-base-pair mutation rates at two genes differ significantly (3.80 × 10^{−10} at *URA3* and 6.44 × 10^{−10} at *CAN1*) and we propose a definition for the effective target size of genes (the probability that a mutation inactivates the gene) that acknowledges that the mutation rate is nonuniform across the genome.

MUTATION rate is an important parameter in evolution. It limits the speed of adaptation in populations with beneficial mutations; in the absence of beneficial mutations it sets the equilibrium fitness of the population. Despite its importance, there are large uncertainties in estimates of the per-genome per-generation mutation rate. Estimating this parameter is typically a three-step process: determining the mutation rate to a particular phenotype, converting this phenotypic rate into a per-base-pair mutation rate in a particular gene, and extrapolating this local rate to the entire genome. In this article we focus on the technical challenge of determining phenotypic mutation rates accurately and the analytical task of determining the effective target size of a gene, the probability that a mutation somewhere in a defined segment of the genome produces a mutation with a specified phenotypic effect.

Three methods are commonly employed to measure phenotypic mutation rates: mutation-accumulation assays, mutant accumulation assays, and fluctuation assays. The mutation-accumulation assay involves passing a culture through recurrent bottlenecks, ideally of a single cell/individual, such that all mutations are nearly neutral. This is useful for determining the rate of mutations affecting fitness since repeated bottlenecks will reduce the effect of selection (Kibota and Lynch 1996; Zeyl and Devisser 2001). This method works well in multicellular organisms, where the population size can be maintained at the bottleneck; however, in microorganisms, where a visible colony must be allowed to form, selection will still occur between the bottlenecks. Several methods are available for estimating phenotypic mutation rates from mutation-accumulation assays (Garcia-Dorado and Gallego 2003); alternatively, direct sequencing can be used since all mutations occur in the same genome (Denver *et al.* 2004; Haag-Liautard *et al.* 2007).

In the mutant-accumulation assay, the frequency of a phenotype that is neutral in the environment of the experiment but can be selected for in an alternative environment is monitored in an exponentially growing culture by periodically plating an aliquot of the culture onto selective medium. Once the population reaches a size such that the probability of a new mutation occurring in the next generation is approximately one, the frequency of mutants will increase linearly with time. Thus an accurate estimate of phenotypic mutation rate requires a long intervals between frequency measurements and these experiments typically last for hundreds of generations. Because most of the population has not accumulated the neutral mutation, beneficial mutations will predominantly occur in cells lacking the neutral mutation, thus slowing the accumulation of mutants.

In the fluctuation assay, many parallel cultures are inoculated with a small number of cells, grown under nonselective conditions, and plated to select for mutants (Luria and Delbrück 1943). The number of mutations that arise in each culture will follow the Poisson distribution; however, the number of mutant cells per culture will vary greatly since early mutations will lead to “jackpots,” cultures that contain a great many mutant individuals.

The simplest way to estimate the expected number of mutations that occur in each culture (*m*) is from the fraction of cultures with zero mutants, which should be *e ^{−m}*. This method (

*P*

_{0}) was used by Luria and Delbrück (1943) in their original article describing the fluctuation assay. The full distribution of mutants per culture (the Luria–Delbrück distribution) can be described by a set of recursive equations (Ma

*et al.*1992). The most accurate method for estimating

*m*(Ma–Sandri–Sarkar maximum likelihood) finds the

*m*that gives the best fit of the Luria–Delbrück distribution to the data (Sarkar

*et al.*1992; Rosche and Foster 2000). By simulation, Stewart (1994) calculates 95% confidence intervals for

*m*obtained using this method; however, for the confidence intervals to be meaningful, the data must follow the Luria–Delbrück distribution.

One way to estimate the quality of data is to plot the cumulative distribution of mutant frequencies on a log–log plot; Luria–Delbrück-distributed data presented in this way will produce a straight line with slope −1 (Rosche and Foster 2000). Deviations from linearity show that the data do not approximate a Luria–Delbrück distribution. This graphical approach ignores jackpots (since they lie far off the line) and cultures with zero mutants (due to the log transformation). In this article we introduce a simulation-based approach to determining the quality of data from fluctuation assays and show how this method can detect deviations from the Luria–Delbrück distribution.

To convert phenotypic mutation rates to a per-base-pair mutation rate we need to estimate the effective target size for phenotypic mutation. Although the concept of effective target size is important in evolutionary theory—it links the mutation rate to a particular phenotype to the mutation rate per genome per generation—it has not been explicitly defined. Previous work has used “target size” to refer to the size of the gene in which mutations are selected (Drake 1991) but could also describe the number of mutations within a reporter that are detectable. We propose a more general and probabilistic definition of effective target size that encompasses the relationship between phenotypic and genomic mutation rates. Our definition illustrates where uncertainties in estimates of genomic mutation rate arise and shows how this parameter can be calculated from experimental data. In particular, we distinguish between the effective target size based on the average per-genome mutation rate and the locus-specific effective target size conditioned on a mutation event occurring in a specific region of the genome.

We measured mutation rates to resistance to 5-fluoroorotic acid (5-FOA), canavanine, and α-factor. 5-FOA is nontoxic, but can be converted into toxic 5-fluoro-uracil by the uracil biosynthesis pathway. The product of the *URA3* gene catalyzes a key step in this process; therefore, 5-FOA predominantly selects for *ura3* loss-of-function mutants. Canavanine is a toxic arginine analog, whose uptake requires the arginine transporter. Canavanine selects for loss-of-function mutants of this transporter, which is encoded by the *CAN1* gene. α-Factor is a peptide pheromone secreted by mating-type α (*MAT*α) cells. Binding of the pheromone to the Ste2 receptor on a *MAT***a** cell signals through a MAP-kinase cascade to initiate the mating-response genes and a G_{1} arrest (Dohlman 2002). Wild-type *MAT***a** cells secrete a protease, Bar1, which degrades α-factor; deleting *BAR1* prevents growth on medium containing α-factor and allows us to measure the rate of resistance to α-factor using the fluctuation assay. There are at least 10 genes whose loss-of-function results in α-factor resistance; therefore, we expect the mutation rate to α-factor resistance to be an order of magnitude higher than the mutation rate to 5-FOA and canavanine resistance. We find the phenotypic mutation rates to 5-FOA, canavanine, and α-factor resistance to be 5.43 × 10^{−8}, 1.52 × 10^{−7}, and 3.07 × 10^{−6}/genome/generation, respectively. Combining our estimates of phenotypic mutation rates and locus-specific effective target sizes, we conclude that the per-base-pair mutation rates at *URA3* and *CAN1* are 3.80 × 10^{−10} and 6.44 × 10^{−10}/bp/generation, respectively, suggesting that the mutation rate varies across the yeast genome.

## MATERIALS AND METHODS

#### Strains and media:

GIL104 is a haploid yeast strain derived from the W303 background with genotype *URA3*, *leu2*, *trp1*, *CAN1*, *ade2*, *his3*, *bar1*Δ∷*ADE2*, *MAT***a**. Yeast were grown in synthetic complete (SC) medium (Sherman *et al.* 1974), SC medium without uracil (SC −Ura), or SC medium with only 1% glucose (SCLG). The cultures that composed the fluctuation assays were plated onto four types of selective media: 1× canavanine [SC medium without arginine (SC −Arg), 60 mg/liter l-canavanine (Sigma-Aldrich, St. Louis)]; 10× canavanine (SC −Arg, 0.6 g/liter l-canavanine); 5-FOA [SC −Ura, 1 g/liter 5-FOA (Sigma-Aldrich)]; and α-factor [yeast extract, peptone, dextrose (YPD), 10 μg/ml α-factor] (Bio-Synthesis, Lewisville, TX). In preparation for plating several spots of mutant cultures on each plate, the plates were overdried by pressing a Whatman filter paper (grade 3, 90 mm) onto the plates, using a replica-plating block and allowing the filter to remain in place for at least 30 min. The filters remove ∼1 ml of liquid and plates can be used for several days after filters have been removed.

#### Fluctuation assays:

Fluctuation assays were performed on 10 clones of GIL104 to determine the rate at which cells mutated to become resistant to 5-FOA, 10× canavanine, or α-factor. Media and culture volumes were chosen such that a similar number of mutants would be counted for each phenotype: 200 μl of SC, 100 μl of SC, and 10 μl of SCLG for resistance to 5-FOA, 10× canavanine, and α-factor, respectively.

To begin each fluctuation assay, a single clone was grown overnight to saturation in SC −Ura, diluted 1:10,000 into the appropriate medium, dispensed into 96-well plates, and sealed with an aluminum plate seal to prevent evaporation. This represents initial inocula of ∼2000, 1000, and 200 cells for the cultures assayed for mutations to 5-FOA, 10× canavanine, and α-factor resistance. Cultures were grown for 2 days at 30° without shaking (only 1 day for the low-glucose cultures, which saturated after 1 day's growth) and resuspended using a Titramax 1000 orbital shaker (Rose Scientific, Cincinnati) prior to plating. Twenty-four cultures were pooled, diluted, and counted in triplicate using a Beckman Coulter particle counter (Beckman Coulter, Fullerton, CA) to determine the average number of cells per culture. The remaining 72 cultures were spot plated onto overdried plates to select for mutants: 200-μl cultures were spotted onto 12 5-FOA plates (six spots per plate); 100-μl cultures were spotted onto eight 10× canavanine plates (nine spots per plate); and 10-μl cultures were brought up to 100 μl with sterile water and spotted onto eight α-factor plates (nine spots per plate). A Tecan Genesis liquid handler (Tecan U.S., Durham, NC) was used to semi-automate spot plating. We assume that the loss of mutant cells due to evaporation, liquid handling, and incomplete plating efficiency is negligible. Under the conditions used, the average numbers of cells per culture were 2.1 × 10^{7}, 1.3 × 10^{7}, and 3.8 × 10^{5}, for 5-FOA, 10× canavanine, and α-factor, respectively.

Plates were allowed to dry overnight at room temperature and then incubated at 30° for 1, 2, or 5 days for α-factor, 10× canavanine, and 5-FOA, respectively, after which time the number of mutants per spot was counted using a dissection microscope. For 10× canavanine and α-factor plates we used a size threshold: colonies <1 mm at 10× magnification for canavanine or 3 mm at 6× magnification for α-factor were presumed to result from mutations that had occurred after the cells were plated and were not counted. The choice of the size cutoff was based on looking for a natural break in the colony-size distribution. However, the size distribution was not bimodal; therefore, it is reasonable to assume that some leaky mutants were excluded. This is clear when we observe jackpots of mutants smaller than our size threshold, which were excluded from the analysis. For this reason, it is important that the strains we sequenced to determine target size were chosen from the plates from the fluctuation assays so that any leaky mutants, which were excluded from the determination of mutation rates, were also excluded from the calculation of target size. Fluctuation assays for resistance to 1× canavanine were performed similarly to those for 10× canavanine except 1× canavanine plates were counted 3 days postplating.

#### Analysis of fluctuation data:

Fluctuation data were analyzed by the Ma–Sandri–Sarkar maximum-likelihood method in which the data are fitted to a model of the Luria–Delbrück distribution on the basis of a single parameter *m*, the expected number of mutation events per culture (Sarkar *et al.* 1992). Mutation rate is calculated from the equation μ = *m/N*, where *N* is the average number of cells per culture (approximately equal to the number of cell divisions per culture since the initial inoculum is much smaller than *N*). Ninety-five percent confidence intervals on *m* and μ were assigned using Equations 29 and 30 from Foster (2006).

The data were also fitted to a two-parameter model that accounts for postplating growth. This model is a Luria–Delbrück distribution overlaid with a Poisson distribution with a rate *N*μ*d* = *md*, where *d* is the mean number of cell divisions (in which mutants could occur and be detected) in the lineage of cells that were plated on the selective plates; *d* can be related to the number of generations of growth postplating (*g*) by *d* = 2* ^{g}* − 1. The probability distribution for the number of mutants per culture in the two-parameter model is thus the joint distribution of the Luria–Delbrück (parameter

*m*) and the Poisson (parameter

*n*=

*md*); the

*m*'s are the same assuming that the mutation rate is the same for the postplating cell divisions.

We used two tests to assess whether the one- or the two-parameter models best fitted the data. Both relied on calculating the probability of recovering the observed data given the model. This probability, *P _{j}*, where

*j*is the number of parameters in the model, was calculated as , where

*p*is the probability of observing

_{i}*i*mutants and

*N*is the number of independent cultures that produce

*i*mutant colonies. The log-likelihood-ratio test calculates Λ

*=*2(log(

*P*

_{2}) − log(

*P*

_{1})), which is distributed as a χ

^{2}-distribution with 1 d.f. Akaike's information criterion (AIC) calculates the parameter AIC

*=*2

*j −*2(log(

*P*)) and asserts that the best fit is the one with the lowest AIC value. The

_{j}*F*test is not appropriate for these comparisons since it assumes a linear model with errors that are drawn from the same normal distribution at each data point, both assumptions that are violated by the data generated from fluctuation assays.

#### Sequencing of *ura3* and *can1* mutants:

Table 1 lists the primers that were used to amplify and sequence the *ura3* and *can1* alleles from 5-FOA and 10× canavanine-resistant colonies, respectively. Prior to the isolation of genomic DNA, 5-FOA and 10× canavanine-resistant colonies were restreaked on selective medium.

#### Computational analysis:

The Ma–Sandri–Sarkar maximum-likelihood analysis and the two-parameter fitting were performed in Matlab (MathWorks, Natick, MA). Fitting to the two-parameter model was achieved by optimizing *m* (with *d* fixed), optimizing *d* (with *m* fixed), and repeating this process until convergence. AIC was used to determine which model best fits the data (Akaike 1974). Matlab was also used to simulate fluctuation data, calculate the sum-of-squares differences between Luria–Delbrück distributions and data, and bootstrap estimates of effective target sizes to generate 95% confidence intervals. Matlab files for most of these operations are available at http://murraylab.mcb.harvard.edu/fluctuation/programs.htm.

## RESULTS

#### Fluctuation assays and phenotypic mutation rates:

The accuracy of mutation rate estimates from fluctuation assays depends on how the experiment is performed and how the data are analyzed. We have made improvements to both and consider each in turn.

##### Performing fluctuation assays:

One way to increase the accuracy of mutation rate estimates from fluctuation assays is to increase the number of cultures (Stewart 1994). Typically, fluctuation assays are performed in test tubes; however, to increase the throughput, we perform the assays in 96-well plates. We plate 72 of the cultures to selective medium to determine the number of mutants per culture; the remaining 24 are used to determine the average number of cells per culture (see materials and methods). Using the 96-well format we can vary the culture volume from 10 to 200 μl and can measure mutation rates over two orders of magnitude (Table 2).

Rather than spreading cultures onto selective medium, we spot cultures onto overdried plates, where they spread uniformly over an area of 1.3–3 cm^{2}, depending on the volume spotted. This increases efficiency and reduces the number of plates since up to nine cultures can be spotted onto one plate.

The combination of spot plating and the 96-well format allow for automation of the fluctuation assay. To demonstrate this, we semi-automated the process using a liquid handler; this enabled us to perform all fluctuation assays described here—the equivalent of three 720-tube fluctuation assays—in parallel.

##### Analyzing fluctuation data and postplating growth on 1× canavanine:

There are many methods for calculating mutation rates from fluctuation data (Foster 2006), of which the Ma–Sandri–Sarkar maximum-likelihood method is preferred because it is the most accurate, it is valid for any range of the expected number of mutation events per culture (*m*), and 95% confidence intervals can be calculated by an empirically determined set of equations (Stewart 1994; Rosche and Foster 2000). For estimates of mutation rates and 95% confidence intervals generated from this method to be accurate the data must approximate the Luria–Delbrück distribution.

We tested this approximation by using the Ma–Sandri–Sarkar maximum-likelihood method to estimate *m* and then plotting the predicted cumulative frequency distribution of mutants against the experimental data. Fluctuation assays on 5-FOA produced close agreement between predicted and observed distributions (Figure 1). In contrast, assays on 1× canavanine and α-factor produced data that deviate significantly from the Luria–Delbrück distribution. Compared to the expected distribution, cultures with a small number of mutants are underrepresented and cultures with many mutants are overrepresented in the 1× canavanine experiment (Figure 1, one-parameter model). This deviation can be explained as the combination of a Luria–Delbrück distribution and a Poisson distribution.

One possible explanation is that canavanine-sensitive cells can divide and give rise to canavanine-resistant mutations after they have been plated; the number of additional mutant colonies will follow the Poisson distribution. We fitted the distribution of mutant frequencies to a two-parameter model that incorporates postplating growth and mutation. This model is the joint distribution of a Luria–Delbrück distribution (with parameter *m*) and a Poisson distribution (with parameter *n* = *md*). The data from 1× canavanine fitted better to the two-parameter model (Figure 1).

We quantified the improvement of the fit by two methods. The first is to compare the best log likelihood calculated under one- and two-parameter models. As long as a large number of cultures are used, twice the difference between the best one- and two-parameter scores [Λ = 2(log(*P*_{2 Param}) − log(*P*_{1 Param})] is distributed as χ^{2} with 1 d.f. This means that if the best two-parameter score is >1.92 log units better than the best one-parameters score the probability is >0.95 that the two-parameter model is a better fit to the data. The second method was to use the AIC, which uses both the log likelihood and the number of parameters in calculating a score to find the model that best fits the data. Both are described in more detail in materials and methods.

For fluctuation assays on 1× canavanine, both the log-likelihood-ratio test and the AIC indicate that the data are best fitted by the two-parameter model. For fluctuation assays on 5-FOA, there is no improvement of fit using the two-parameter model. To minimize postplating mutation we increased the canavanine concentration 10-fold and counted the plates 1 day earlier. The data from 10× canavanine more closely approximate the Luria–Delbrück distribution (Figure 1). Although the two-parameter model still gives a slightly better fit, according to both the log-likelihood-ratio test and the AIC, the data are best fitted by the one-parameter model. For the fluctuation assay on α-factor, the data are best fitted by the two-parameter model; however, both models fail to capture all features of this distribution (Figure 1).

##### Phenotypic mutation rates:

Fluctuation assays were performed to determine mutation rates to α-factor, 10× canavanine, and 5-FOA resistance for 10 isogenic clones of a strain from the W303 background (GIL104); the data were analyzed using the one-parameter and two-parameter models (Tables 2 and 3, respectively). For each assay, the log-likelihood-ratio test and the AIC were applied to determine which model best fitted the data (Table 3). All fluctuation assays on α-factor are best described by the two-parameter model; whereas, all fluctuation assays on 5-FOA are best described by the Luria–Delbrück distribution (the one-parameter model). For 10× canavanine, five fluctuation assays according to the log-likelihood-ratio test, or six according to the AIC, are best fitted by the two-parameter model.

Using the combined data from the 10 clones (effectively a fluctuation assay with 720 parallel cultures) and the two-parameter model we determine phenotypic mutation rates to α-factor, 10× canavanine, and 5-FOA resistance to be 3.07 × 10^{−6}, 1.52 × 10^{−7}, and 5.43 × 10^{−8}, respectively. For 5-FOA resistance, the data are best described by the one-parameter model (*d* = 0 for the two-parameter model, meaning that postplating growth and mutation does not occur); therefore, we can use Equations 29 and 30 of Foster (2006) to assign a 95% confidence interval to our estimate of mutation rate. This yields a confidence interval of 4.97–5.91 × 10^{−8} per generation (Table 2). For the two-parameter model we determined confidence intervals by simulation. For each combined 720-culture fluctuation assay we determined the most-likely values for *m* and *d*, given the data. To gauge the expected variation in these parameters, we simulated 1000 fluctuation assays by sampling the combined Luria–Delbrück/Poisson distribution using parameters determined from the data. We take the 95% confidence intervals for our estimate of *m* to be the values of *m* that encompass 95% of the simulated experiments. From this we calculate the 95% confidence intervals on the two-parameter model to be 2.65–3.62 × 10^{−6}, 1.34–1.71 × 10^{−7}, and 4.78–5.87 × 10^{−8} for α-factor, 10× canavanine, and 5-FOA resistance, respectively.

#### Mutational spectra and target size:

##### Mutational spectra:

We wanted to convert our phenotypic mutation rates to per-base-pair mutation rates. The material for this conversion is the mutational spectra; from the fluctuation assays we sequenced 237 *ura3* alleles and 227 *can1* alleles from 5-FOA- and 10× canavanine-resistant strains, respectively (the identity of all 464 mutant alleles is available in the supplemental information at http://www.genetics.org/supplemental/). Thirty 5-FOA-resistant mutants contain wild-type *URA3* alleles; 29 of these mutants are uracil prototrophs. It has been reported that mutations in *FUR1* can confer this phenotype; however, we failed to find any mutations within the coding sequence of this gene for any of the 29 5-FOA-resistant uracil prototrophs (data not shown). None of the 207 *ura3* mutants are prototrophic, and each contains a single mutation or two mutations within a few nucleotides. There are 167 bp substitutions (64 nonsense and 103 missense), 22 single-bp deletions, 3 2-bp deletions, 3 single-bp insertions, one 3-bp insertion, two large tandem duplications, and nine double mutations (Figure 2). All 227 10× canavanine-resistant mutants contain a single mutation or closely adjacent mutations at the *CAN1* locus. We find 150 bp substitutions (70 nonsense and 80 missense), 55 single-bp deletions, 8 single-bp insertions, one 2-bp insertion, 10 double mutations, and 3 complex mutations including a *can1* allele containing a 27-bp deletion and a 30-bp tandem duplication (Figure 3).

##### Per-base-pair rate of nonsense mutations:

From the results thus far, we can calculate the per-base-pair mutation rate to nonsense mutations at the *CAN1* and *URA3* genes. First we need to correct the phenotypic mutation rate to 5-FOA resistance to take into account that only 207 of 237 5-FOA mutants are *URA3* mutants. This results in a mutation rate for loss of function of 4.75 × 10^{−8} for *URA3* and 1.52 × 10^{−7} for *CAN1*. If we multiply these rates by the fraction of nonsense mutations in the mutational spectra we find that the rates of nonsense mutations at *URA3* and *CAN1* are 1.47 × 10^{−8} and 4.69 × 10^{−8}, respectively. For *URA3* and *CAN1*, we counted the number of possible nonsense substitutions from the known sequences of these genes. *URA3* is 804 bp; therefore, there are 2412 possible substitutions (804 bp × 3 possible substitutions per base pair). Of these, 123 result in nonsense mutations. By dividing these rates by the number of possible nonsense substitutions and multiplying by 3, since there are 3 possible mutations at each base, we find that the nonsense mutation rate normalized per base pair is 3.58 × 10^{−10} for *URA3* and 6.21 × 10^{−10} for *CAN1*. Repeating the above analysis for all 10 fluctuation assays at *CAN1* and *URA3* from Table 3 we find that the per-base-pair nonsense mutation rates differ significantly at these two loci (Wilcoxon rank sum, *P* < 1.83 × 10^{−4}). These calculations were performed with mutation rates from the two-parameter model, correcting for postplating growth and mutation on canavanine plates. Had we used the one-parameter model, the difference in mutation rates between *URA3* and *CAN1* would have been greater, since the one-parameter model overestimates the phenotypic mutation rates to canavanine resistance.

##### Definition of effective target size:

We define a target size to perform two calculations: determining the genomic mutation rate given experiments that measure the rate of mutation at a particular locus and predicting the rate at which a particular phenotype arises given a genomic mutation rate. This requires that we deal with two problems: variation in the base pair substitution rate across the genome and differences between base pairs in the probability that a substitution at that base produces a mutant phenotype. Allowing the target size to depend on the location of the gene seems counterintuitive, but is essential if we want to consider the effects of rearrangements (man made or evolutionary) that alter the location of the gene. We deal with the different substitutions at the same base pair by defining the target size to be the number of base pairs that would account for the observed mutation rate if *every substitution* at each of these bases produced the mutant phenotype. As an example, consider the target size for nonsense mutations in a gene with 150 bp at which a single substitution could produce a nonsense codon and 24 bp (such as the C:G base pair in a UCA codon) at which two different substitutions both yield nonsense codons. The target size for nonsense mutations would be 150/3 + 24 · (2/3) = 66 bp, meaning the gene behaves as if it had 66 bp at which *any* substitution would produce a nonsense codon and that we could derive the genomic mutation rate by dividing the observed rate of nonsense mutations in this gene by 66.

There is no neat formula for the fraction of substitutions that give missense mutations, and each gene will have a different distribution of base pairs where zero, one, two, or all three possible substitutions produce mutant phenotypes. We deal with this complexity by estimating the number of possible base pair substitutions that give rise to mutant phenotypes and dividing by three, to produce a target size that assumes that for any base pair in the target, all three substitutions produce the mutant phenotype.

We define effective target size as the size of the genome, *G*, multiplied by the probability that introducing a single genomic mutation (this could be a base pair substitution, insertion/deletion, transposition, etc.) will result in the phenotype of interest:Thus, the effective target size to canavanine resistance isWe can specify the effective target size given a particular class of mutation. For instance, the target size for mutation to canavanine resistance by way of a base pair substitution isFurthermore, we can restrict the region of the genome in question to define a locus-specific effective target size. For example, the locus-specific effective target size for canavanine resistance by way of a base pair substitution at the *CAN1* locus iswhere 1773 bp is the size of the *CAN1* locus. Note that there are two relationships that give the genomewide average mutation rate per base pair per generation (): the mutation rate to a given phenotype ( for the mutation rate to canavanine resistance) divided by the target size for such mutations () and the overall, genomewide mutation rate () divided by the length of the genome (*G*),where is the mutation rate to canavanine resistance, is the genomewide average mutation rate per base pair per generation, and is the mutation rate per genome per generation. Similarly, the average mutation rate per base pair per generation at the *CAN1* locus () is the mutation rate to canvanine resistance () divided by the locus-specific, effective target size for mutations to canavanine resistance at *CAN1*, . The locus-specific target size and mutation rate, and , are related to the genomewide target size and mutation rate, and , through the parameter λ* ^{CAN1}*, which is the ratio of the mutation rate at the

*CAN1*locus compared to the genomewide average; λ = 1 identifies loci where the mutation rate equals the genomic average, loci where λ < 1 are coldspots, and those where λ > 1 are hotspots:and

##### Calculation of effective target size and the per-base-pair mutation rate:

The effective target size to canavanine resistance, , is difficult to determine experimentally; however, from mutational spectra we can determine the locus-specific effective target size to canavanine resistance conditioned on a mutation at the *CAN1* locus, . Table 4 provides a summary of the calculations of this target size and the per-base-pair mutation rate at *CAN1* and *URA3*. To calculate , we first rewrite it as
where and are the frequency with which base pair substitutions and insertion, deletion, or other DNA rearrangements (which we collectively refer to as indels) occur. Assuming that all indels in *CAN1* result in loss of function, is 1773 bp. To determine we separate the observed base pair substitutions into nonsense and missense (70 and 80, respectively). *CAN1* contains 226 possible nonsense substitutions, 54 of which we found (as expected, some mutations were identified multiple times). The 80 missense mutations represent 63 unique substitutions. Assuming that we identified the same proportion of detectable missense and nonsense mutations, we can calculate the number of missense mutations conferring canavanine resistance as 63(226/54) = 264. Since we want our effective target size to be the number of base pairs at which any substitution would produce a mutation and there are three possible substitutions at each base, the locus-specific effective target size for canavanine resistance at the *CAN1* locus by way of missense and nonsense mutations is 264/3 = 88 bp and 226/3 = 75 bp, respectively. From the *CAN1* sequence, we know the location of every possible nonsense mutation. For the missense mutations we know that there are 264 possible mutations; however, our method is blind to the locations of the mutations other than those identified in our mutational spectra. A locus-specific effective target size for missense mutations of 88 bp could represent 88 positions where any of the three possible substitutions causes a phenotype, 264 positions where only 1 of 3 substitutions causes a phenotype, or something in between.

Combining locus-specific effective target sizes for nonsense and missense mutations we find thatThis locus-specific effective target size indicates that 163/1773 (9%) of base pair substitutions at the *CAN1* locus result in canavanine resistance. To calculate the mutation rate per base pair per generation by way of base pair substitutions, we need to consider that only 150 of 226 mutations detected at the *CAN1* locus were base pair substitutions; therefore,We can now calculate the mutation rate per base pair per generation for all mutations. The frequency of base pair substitutions and indel mutations in the *CAN1* mutational spectrum is 150/226 (∼66%) and 77/226 (∼34%), respectively, but only 9% of base pair substitutions result in canavanine resistance. Thus the fraction of mutations that are substitutions, , is actually 0.95 and those that are indels, , is only 0.05. Using these values, we estimate the locus-specific effective target size to canavanine resistance at the *CAN1* locus to betherefore,

Similar calculations for *URA3* show thatand

Taking into account that only 207 of the 237 5-FOA-resistant mutants sequenced were *ura3* mutants, the rate of mutation to 5-FOA resistance at *URA3* is 4.75 × 10^{−8}/cell/generation. Thus we calculateand

## DISCUSSION

We have improved the execution and analysis of the fluctuation assay and have developed methods for asking whether observed data are derived from a Luria–Delbrück distribution. Our results suggest that the per-base-pair mutation rate is different in different parts of the genome and that the vast majority of mutations are single-base-pair substitutions.

#### Analyzing fluctuation data:

The Ma–Sandri–Sarkar maximum-likelihood method is the most accurate method for estimating the expected number of mutants per culture (*m*) from fluctuation data; however, this method assumes that the data follow the Luria–Delbrück distribution. We have shown that postplating proliferation and mutation of canavanine-sensitive cells on 1× canavanine plates can be detected since it produces a deviation from the expected Luria–Delbrück distribution. If the data are not corrected, this leads to an overestimation of the mutation rate. One can correct for this by fitting the data to a two-parameter model that accounts for postplating growth or largely eliminate it by increasing the concentration of canavanine. Other processes that introduce error into mutation rate estimates such as differential growth rates between mutants and nonmutants (Zheng 2005) and poor plating efficiency (Stewart *et al.* 1990; Stewart 1991) will also produce deviations from the expected Luria–Delbrück distribution. Therefore, we suggest that fitting fluctuation data to the cumulative distribution and comparing the sum-of-squares differences with simulated data should be used as a general method for assaying the quality of data resulting from fluctuation assays.

We can assign significance to deviations from the Luria–Delbrück distribution by simulation. Data from the 1× canavanine fluctuation assay (Figure 1) give a maximum-likelihood value of *m* = 4.80. We calculated the sum-of-squares differences between the cumulative distribution of the data and the Luria–Delbrück distribution with *m* = 4.80. To determine the expected sum-of-squares differences, we simulated 10,000 72-tube fluctuation assays by sampling from the Luria–Delbrück distribution with *m* = 4.80 and calculated the sum-of-squares differences for each simulated experiment. We find that only 3.5% of the simulated experiments have a poorer fit to the Luria–Delbrück distribution than the observed 1× canavanine data compared to 30 and 41% for 10× canavanine and 5-FOA, respectively.

#### Mutational spectra:

We sequenced 237 5-FOA-resistant *ura3* alleles and 227 10× canavanine-resistant *can1* alleles to determine the locus-specific effective target size for phenotypic mutations. From these data sets we can garner additional information regarding the mutagenic processes leading to loss of function at *URA3* and *CAN1*. Nonsense mutations represent a larger fraction of base pair substitutions in the *can1* data set (47% *vs.* 38%). This indicates that a larger fraction of missense mutations cause loss of function for *URA3* (10.9% *vs.* 6.8% as calculated by dividing the number of possible loss-of-function missense mutations by the number of possible missense mutations). This difference is reflected in our calculation of locus-specific effective target size where, although the coding sequence of *CAN1* is 2.2 times larger, the effective target size for loss of function by way of base pair substitutions is only 1.6 times larger. Loss-of-function mutations in our mutational spectra are overrepresented at conserved residues (*P* = 1.5 × 10^{−5}, Wilcoxon rank sum, A. Singhal and A. Segre, data not shown). In our compiled *URA3* and *CAN1* mutational spectra we identified 88 single-bp insertions/deletions in which deletions were overrepresented by 7:1 (*P* < 0.001, chi square). Of the base pair substitutions, 206 were transversions and 111 were transitions (Table 5). This is consistent (*P* > 0.25, chi square) with the 2:1 ratio of transversions to transitions we expect if all substitutions are equally probable.

There are two ways we can test whether mutations occur randomly within the target sequences. Since we know every position where a nonsense mutation can occur we can ask if mutations fall randomly over these sites. When looking at the distribution of nonsense mutations we assume that all nonsense mutations result in loss of function. For *URA3* this assumption is reasonable since our data set includes a nonsense mutation eight amino acids before the stop codon removing the last 1% of the protein. Dividing the *URA3* and *CAN1* sequences into fifths we find that the observed number of nonsense mutations in each region does not differ significantly from expectation (*URA3*, observed, 10, 10, 19, 10, 15; expected, 13, 12, 12, 11, 16, *P* > 0.05, chi square; *CAN1*, observed, 10, 18, 17, 14, 11; expected, 14, 15, 13, 13, 15, *P* > 0.05, chi square).

In addition, we can test for mutational hotspots/coldspots by asking if the number of times we found a given base pair substitution deviates from what we would expect from binomial sampling. For *URA3* the numbers of mutations we identified zero, one, two, three, or four times are 206, 71, 18, 12, and 6. These deviate significantly from the expectation of binomial sampling (184, 98, 26, 5, and 1; *P* < 0.01, chi square). Similarly, for *CAN1* the numbers of mutations we identified zero, one, two, three, or four times are 373, 91, 20, 5, and 1, which deviate significantly from binomial sampling (360, 111, 17, 2, and 0; *P* < 0.05, chi square). Therefore, although we do not see regional biases in our mutational spectra we do find particular substitutions to be over/underrepresented, possibly reflecting biases due to local sequence context effects. The variation we find in the yeast *URA3* and *CAN1* genes is significantly less than the degree of variation seen across the *LacI* gene in *Escherichia coli* (Miller *et al.* 1977).

We found 20 instances of multiple mutation events occurring in the same strain. One *can1* allele contains a 27-bp deletion and a 30-bp imperfect duplication separated by 312 wild-type bp. The remaining 20 were multiple mutation events occurring within a few nucleotides of each other, 9 in *ura3* and 11 in *can1* (Table 6). In one case the same complex mutation, a double deletion and base pair substitution, was found in two *can1* strains that were adjacent during much of the processing (restreaking, genomic DNA preparation, PCR, and sequencing); therefore, this may represent a single event that was inadvertently sampled twice. Half of the multiple mutation events are interspersed with one or more bases of wild-type sequence; therefore, multiple mutation events must have occurred. These events may represent instances where lesion bypass has occurred and the multiple mutations result from decreased fidelity of translesion polymerases. The translesion polymerase Polζ can efficiently extend unpaired primer termini resulting from incorporation opposite a lesion and it is thought that up to half of all spontaneous mutations occur in a Polζ-dependent manner. (Rattray and Strathern 2003; Prakash *et al.* 2005).

#### Effective target size:

The mutation rate per genome per generation is a fundamental parameter in molecular evolution. Here we introduce the effective target size (τ) to phenotypic mutation as a way to link the mutation rate per genome per generation to the measurable phenotypic mutation rate. We have defined effective target size asWe use a bottom-up approach based upon mutational spectra to calculate the effective target size to phenotypic mutation. For example, for canavanine resistance we first calculate the effective target size to phenotypic mutation by way of a base pair substitution at the *CAN1* locus (). Intuitively this means that if one considers only base pair substitutions, the *CAN1* gene is effectively 163 bp where any base pair substitution will result in canavanine resistance. This value is then used to calculate the locus-specific effective target size to canavanine resistance by way of any mutation (), meaning that *CAN1* is effectively 236 bp where any mutation will result in canavanine resistance. To calculate one needs to weight the effective target sizes for canavanine resistance by way of each particular class of mutation by the frequency with which that mutation occurs.

The effective target sizes that are calculated are valid only as long as the frequencies of particular classes of mutation are conserved and, therefore, are likely to vary between strain backgrounds and growth conditions. Varying the selective medium may alter the fraction of missense mutations; for example, some *ura3* mutants will form colonies at low concentrations of 5-fluoroorotic acid but not at high ones. Therefore, we determined the effective target sizes by sequencing mutant *ura3* and *can1* alleles from the same plates that were used for the fluctuation assays.

Genes of similar lengths may have very different effective target sizes. Mutational hotspots such as microsatellite sequences and polynucleotide runs will increase the effective target size by increasing the local rate of frameshift mutations. Mutator alleles not only increase the mutation rate, but also influence the effective target size by altering the mutational spectra. In addition, since mutation rates are believed to vary across the genome (Ito-Harashima *et al.* 2002; Hawk *et al.* 2005), the effective target size may change if a gene is moved to a different location. In the context of this experiment, if *CAN1* were moved to a location in the genome where the mutation rate is twofold higher than at its endogenous locus, the target size will be twice as large, since, given the definition of effective target size, moving the gene doubles the probability of a mutation resulting in canavanine resistance given a single mutation occurring anywhere in the genome. In our notation, we do not explicitly state that the *CAN1* gene is at an endogenous location. We do, however, indicate that our estimate of locus-specific effective target size is conditioned upon a mutation within the 1773-bp region of the *CAN1* coding sequence. We specify this with the superscript *CAN1* () to distinguish this locus-specific effective target size from the effective target size for a mutation occurring anywhere within the genome (). and are related by the hotspot parameter λ,

#### Target size for mutations conferring resistance to α-factor:

If we take to be the average of the per-base-pair mutation rates at *CAN1* and *URA3*, 5.12 × 10^{−10}, we can estimate the effective target size for mutation to α-factor resistance asTaking the mean ratio of target size to gene size for *CAN1* and *URA3*, 0.14, this suggests that the total length of genes in which a loss-of-function mutation results in α-factor resistance is 41.5 kbp. Summing over the lengths of known targets (*STE2*, *STE4*, *STE5*, *STE7*, *STE11*, *STE12*, *STE20*, *STE50*, *FAR1*, and *FUS1*) accounts for only 18.6 kbp. There are four possible explanations for this inconsistency. There could be unidentified genes whose inactivation results in α-factor resistance; however, given the degree to which the mating pathway has been studied, it is unlikely that enough components remain unidentified to account for this difference. It could be that many more loss-of-function missense mutations are possible for signaling proteins than for enzymes or transporters. The third possibility is that a change of the mating-type locus from *MAT***a** to *MAT*α provides an additional class of mutation to α-factor resistance. Although the strain used in this study is heterothallic, a spontaneous double-strand break at the *MAT* locus can be repaired off of the silent *HML*α cassette, resulting in mating-type switching and α-factor resistance. The rate of mating-type switching in heterothallic yeast is estimated to be between 10^{−7} and 10^{−6} (Klein and Wintersberger 1988). An additional possibility is that *CAN1* and *URA3* are located in regions that are coldspots compared to the genomewide average mutation rate (or one or more of the genes involved in α-factor resistance could be located in a mutational hotspot; however, since the target for α-factor resistance is spread over the genome, it likely averages over local hotspots and coldspots). Asserting that the known genes that can mutate to confer α-factor resistance had an average λ of 1 predicts λ* ^{CAN1}* = 0.55 and λ

*= 0.32.*

^{URA3}#### Mutation rate per base pair per generation:

We measured phenotypic mutation rates and, from the same experiments, the locus-specific effective target sizes to phenotypic mutation. Our results indicate that the per-base-pair mutation rate at *URA3* and *CAN1* is and , respectively. Drake (1991) obtains similar values, but his method differs slightly. He also utilizes fluctuation assays and mutational spectra; however, rather than calculate the effective target size to phenotypic mutation, Drake estimates the number of base pair substitutions that occurred in the coding sequence as 64/3 times the number of nonsense mutations detected, ignoring missense mutations detected in the mutational spectra. He then calculates a correction factor (the inverse of the detection frequency) to scale the mutation rate and then divides the corrected mutation rate by the size of the open reading frame.

In principle Drake's method and ours should yield similar values for the per-base-pair mutation rate. Analyzing our data using the Drake method yields estimates of and , respectively. Drake converts the per-base-pair mutation rate to a per-genome mutation rate by scaling to the size of the genome. Since these estimates of the per-base-pair mutation rate are specific for particular loci, scaling up is accurate only if mutation rate is uniform across the genome. Several experiments suggest that mutation rate varies across the genome by at least an order of magnitude (Ito-Harashima *et al.* 2002; Hawk *et al.* 2005). On a genomic scale, *URA3* and *CAN1* are relatively close (83 kb apart on the left arm of chromosome V) yet our two-point estimates of the per-base-pair mutation rate differ by a factor of 1.7.

To determine if this difference in mutation rate is significant, we need to place confidence limits on our per-base-pair mutation rate estimates. We showed that the 95% confidence intervals for our estimates of phenotypic mutation rate are 1.34–1.71 × 10^{−7}, and 4.78–5.87 × 10^{−8} for resistance to 10× canavanine and 5-FOA, respectively. Since only 207 of 237 5-FOA-resistant mutations are *URA3* mutants, the 95% confidence interval for the rate of loss of function of *URA3* is 4.17–5.13 × 10^{−8}.

We used a bootstrapping method to generate 95% confidence intervals around our estimates of the effective target size by discarding 25% of our mutational spectra data and recalculating the effective target sizes. This process was iterated 10,000 times for both and . The ranges of these distributions are 99.90–162.82 and 190.64–285.60 for and . Excluding the extreme 2.5% at either end of the distribution, 95% of the values lie between 109.17 and 140.81 for and between 207.16 and 257.73 for . To place conservative confidence limits on the per-base-pair mutation rate estimates we took the lower bound for phenotypic mutation rate divided by the upper bound for the effective target size and vice versa. For example, the lower bound on is 4.17 × 10^{−8}/140.81 = 2.96 × 10^{−10}. Performing this for both *URA3* and *CAN1* yields nonoverlapping confidence intervals of 2.96–4.70 × 10^{−10} and 5.21–8.24 × 10^{−10}/bp/generation for and , respectively. It is possible this observed difference reflects a difference in the ability to detect mutations at these two loci and not an underlying difference in the per-base-pair mutation rate. However, differences in the plating efficiency or phenotypic lag, which would alter the ability to detect mutants, are expected to produce deviations from the Luria–Delbrück distribution, which is not observed in our data (Table 3).

In this article we have shown that the per-base-pair mutation rate varies on two length scales: between different positions within the *CAN1* and *URA3* genes and between the genes themselves. It is possible that these two observations are related; however, since we measure forward mutation rates over large targets, we likely average over the local sequence effects. Therefore, the difference in the per-base-pair mutation rate we observe between *CAN1* and *URA3* is most likely due to a mutation rate variation on a larger scale. Ito-Harashima *et al.* (2002) find that the frequencies of ochre suppressor mutations, detected at eight identical tRNA-Tyr alleles, vary by a factor of ∼20. Hawk *et al.* (2005) show that the rate of microsatellite frameshift mutations varies 16-fold across the genome, due in part to variation in the efficiency of mismatch repair. Consistent with this, we show that the per-base-pair per-generation spontaneous mutation rate is nonuniform across the genome and varies ∼2-fold between two reporters, 83 kb apart, on the left arm of chromosome V.

## Acknowledgments

We thank Thomas Kunkel, Thomas Petes, Kevin Verstrepen, and members of the Murray lab for helpful comments and suggestions. This work was supported by grants from the National Institute of General Medical Sciences [Center of Excellence nos. P50 GM068763 (A.W.M.) and GM43987].

## Footnotes

Communicating editor: J. Lawrence

- Received January 30, 2007.
- Accepted November 5, 2007.

- Copyright © 2008 by the Genetics Society of America