Abstract

Rates of spontaneous mutation per genome as measured in the laboratory are remarkably similar within broad groups of organisms but differ strikingly among groups. Mutation rates in RNA viruses, whose genomes contain ca. 104 bases, are roughly 1 per genome per replication for lytic viruses and roughly 0.1 per genome per replication for retroviruses and a retrotransposon. Mutation rates in microbes with DNA-based chromosomes are close to 1/300 per genome per replication; in this group, therefore, rates per base pair vary inversely and hugely as genome sizes vary from 6 × 103 to 4 × 107 bases or base pairs. Mutation rates in higher eukaryotes are roughly 0.1–100 per genome per sexual generation but are currently indistinguishable from 1/300 per cell division per effective genome (which excludes the fraction of the genome in which most mutations are neutral). It is now possible to specify some of the evolutionary forces that shape these diverse mutation rates.

RATES of spontaneous mutation per replication per measured target vary by many orders of magnitude depending on the mutational target size (from 1 to >1010 b, where b stands for base or base pair as appropriate), the average mutability per b (from 10−4 to 10−11 per b per replication), and the specific mutability of a particular b (which can vary by >104-fold). A mutation rate comprises all kinds of mutations in a mutational target: base pair substitutions, base additions and deletions (often producing frameshifting in exons), and larger or more complex changes. Attempts to detect order in these mutation rates have revealed certain underlying patterns. We describe these patterns, note some of their consequences, and consider their evolutionary origins.

Among the mutations that affect a typical gene, different kinds produce different impacts. A very few are at least momentarily adaptive on an evolutionary scale. Many are deleterious. Some are neutral, that is, they produce no effect strong enough to permit selection for or against; a mutation that is deleterious or advantageous in a large population may be neutral in a small population, where random drift outweighs selection coefficients. The impact of mutation is quite different in different DNA sequences. It is maximal in a conventional gene or exon, and at least transitorily less in a gene whose function is required rarely or is redundant. If adaptive mutations are rare, as seems to be the case, then rates of DNA sequence evolution are driven mainly by mutation and random drift, as Kimura (1983a) has argued. In this case, the proportion of neutral mutations at a site or locus is the ratio of its rate of evolution to that of a region that can be considered neutral, such as a pseudogene. Most newly arisen mutations in functional genes are deleterious, but the fraction may approach zero for spacer DNAs such as introns and intergenic regions. Of course, some protein evolution certainly results from favorable mutations, and to this extent the neutral fraction is overestimated.

The existence of strong taxonomic patterns of mutability implies that genomic mutation rates are close to an evolutionary equilibrium whose driving forces we consider here. The evolution of those rates is likely to reflect their average effect over long periods, but this effect is likely to have been insignificant for much of the spacer DNA.

THE MAGNITUDES OF MUTATION RATES

Terminology: Table 1 describes the abbreviations and parameters we will use to describe the mutation process. Note that the effective genome size Ge is similar to the totalgenome size G in microbes, whereas Ge < G in higher eukaryotes. The most reliable estimates of mutational parameters come from microbes whose genes are encoded by DNA (“DNA-based microbes”); as we discuss, there are considerable uncertainties attached to estimates from RNA viruses and higher eukaryotes.

Mutation rates in lytic RNA viruses: Few investigators of the genetics of RNA viruses have focused specifically on mutation rates, although mutant frequencies are often noted to be high compared with those observed in microbes with DNA chromosomes. In a recent survey (Drake 1993a), most of the mutation rates that could be calculated were necessarily based on results obtained with very small and thus potentially unrepresentative mutational targets, and contained other experimental and calculational uncertainties. These uncertainties included lack of information about what proportion of lytic virus replication is linear (repeated copying of the same template) and what is binary (as in most DNA replication), as well as lack of information about the relative contributions of transcription and reverse transcription to retroviral mutation rates. (J.W.D. wishes to correct a typographical error in his 1993a report: the minus signs do not belong in Equations 2 and 3.)

View this table:
TABLE 1

Parameters used in describing the mutation process

Many (but not all) of the mutation-rate calculations for these viruses were performed by transforming a mutant frequency f into a mutation rate μ, where f was measured in large populations that had accumulated mutants in the putative absence of selection. For linear replication, μlin = f regardless of the extent of growth. For binary replication, μbin = (ff0)/ln(N/N0) where N0 is the initial and N is the final population size. [This holds for N0 > 1/μbin; for N0 < 1/μbin, μbin = f/ln(Nμbin).] Because the relative numbers of binary and linear replications are unknown, Drake (1993a) simply averaged μlin and μbin to obtain μm. μlin was at most about an order of magnitude greater than μbin so that μm was a little less than μlin/2 and about sixfold larger than μbin (range 2.2- to 9.6-fold). In addition to these uncertainties, the relative fidelities of binary and linear RNA replication are yet to be determined, and differences of a few-fold would not be surprising. In addition, the calculational uncertainties were roughly similar in magnitude to the experimental uncertainties. The results from Drake (1993a) are summarized in Table 2. For the lytic RNA viruses, μg ≈ 1 but with considerable scatter. Values of μg > 2 are likely to be overestimates because such values would tend to extinguish the species.

Because a lytic virus replicates repeatedly in each infective cycle, an infected cell yields virus carrying several new mutations per particle. Most of these will be deleterious. The high mutation rate in these viruses may contribute strongly to their characteristic low specific infectivities (infectious particles per physical particle). (Another contributor to low specific activity is the inherent lability of the RNA backbone.) Such viral populations are extremely vulnerable to increased mutation rates, even a three-fold increase leading to extinction (Hollandet al. 1990).

View this table:
TABLE 2

Mutation rates per genome per replication in lytic RNA viruses

In addition to the entries in Table 2, two reports have appeared in which a mutational target of foreign origin was inserted into a lytic RNA virus that was then passaged extensively and eventually screened for accumulated mutations. In the first case (Kearneyet al. 1993), the target resided in tobacco mosaic virus which was serially passed through plants ten times, each passage expanding an inoculum of ~103 infective units (iu) to ~1015 iu. After the final passage, the target sequence was reverse-transcribed and amplified by the polymerase chain reaction (PCR) from a number of isolates and then sequenced. The mutant frequencies were 26/16158 b sequenced (16.1 × 10−4) after passage, and 8/8208 = 9.7 × 10−4 before passage, the latter reasonably attributable to reverse transcription (RT) and PCR errors. While this difference is not significant, if taken at face value it yields a net f = μlin ≈ 6.3 × 10−4 per b. If the population is considered to have expanded from 103 to 1015×10 iu, μbin ≈ 2 × 10−6. Then μm ≈ 3 × 10−4 per b and, for G = 6395 b (Goeletet al. 1982), μg ≈ 2. This value is typical of lytic RNA viruses. However, because μlinbin > 300, this calculation is not robust; μg could approach 0.01 if binary replications predominated. Another confounding factor is the possibility of bottlenecks. A target size of about 200 b, an inoculum of about 1000 and a momentary f of 10−4 would ensure the transmission of about 20 pre-existing mutants at passage. However, if a small fraction of the inoculum contributed heavily to the whole-plant yield, bottlenecks could still occur and the mutation rate would be underestimated.

In the second case (Schnellet al. 1996), the target resided in vesicular stomatitis virus (VSV) which was serially passaged through cultured cells 15 times. Each inoculum of about 105 plaque-forming units (pfu) (applied to about 107 cells) was expanded to about 1011 pfu. The target from six isolates was then reverse-transcribed, PCR-amplified, and sequenced. The mutant frequency was 2/2400 b with no estimate of the contribution of RT and PCR errors. Here f = μlin ≈ 8.3 × 10−4 and, with N0 = 106 and N = 1015×6, μbin ≈ 4.3 × 10−6; then μm ≈ 4.2 × 10−4 per b and μg ≈ 4.7, a result indistinguishable from the values of 2.8 and 4.3 listed in Drake (1993a). However, because μlinbin ≈ 200, the calculation is again not robust and μg could approach 0.05 if binary replications predominated. Here, a target size of 400 b, an inoculum of about 105 and an f of as little as 10−4 would ensure the transmission of about 4000 pre-existing mutants at passage, thus preventing bottlenecks. A more arcane possibility is that the target sequence, a bacterial gene encoding chloramphenicol acetyltransferase, provided an unexpected selective advantage when functional; this could be easily measured. In the end, however, a deeper understanding of these numbers will require much more analysis of the relative number and order of linear and binary replication events, including the supra-binary component arising from multiple cell cycles per passage.

Mutation rates in retro-elements: In contrast to the lytic RNA viruses, a retrovirus or retrotransposon chromosome replicates precisely three times per infective cycle. Transcription by the host RNA polymerase produces an RNA genome. Reverse transcriptase then catalyzes two replications to generate a DNA-based chromosome that integrates into the host chromosome (of a different cell in the case of a packaged retrovirus, or of the same cell in the case of a retrotransposon) and thereafter assumes a far lower mutation rate. The resulting mutant frequency is the sum of the mutation rates of the three steps, whose magnitudes have not yet been factored. Table 3 lists those rates described in Drake (1993a) which were based on large mutational targets, together with several measurements reported since 1993, including one for a long-terminal-repeat retrotransposon. These retro-element rates are roughly an order of magnitude lower than the RNA-virus rates listed in Table 2. Because of the large mutational target sizes employed, the rate differences among these viruses may be real. (The rate does not correlate with the retroviral or artificial origin of the mutational target sequence.) Compared to the lytic viruses, the retroviral mutation rates may not appreciably reduce specific infectivity. Spleen necrosis virus is slightly more resistant to increased mutation rates than are lytic RNA viruses, being obliterated only after a roughly 13-fold increase (Pathak and Temin 1992).

Mutation rates in DNA-based microbes: Rates of spontaneous mutation in this class of organisms were last surveyed in Drake (1991) and are summarized in Table 4 using a few updated values for genome sizes. Unlike the experimental and theoretical limits to the accuracy of the RNA-virus values, the DNA-microbe values were determined in well-studied systems using robust calculations, and the individual values are likely to be accurate to within two-fold. Table 4 shows that μb and G vary inversely and smoothly over nearly four orders of magnitude while μg remains constant. Given the paucity of general, constant values in evolutionary processes, this particular constant is strikingly robust.

Heat promotes a variety of base-loss and base-modification reactions and can be strongly mutagenic. The archeon Sulfobolus acidocaldarius growing at 75° produces pyrE and pyrF mutations at 2.8 ± 0.7 and 1.5 ± 0.6 per 107 cell divisions, respectively (Jacobs and Grogan 1997). Although genome size, mutational target size and efficiency of mutation detection are not yet measured in this system, pyrE genes contain 600–720 b and pyrF genes contain 700–1200 b in several bacteria and eukaryotes, and G = (2–3) × 106 for related bacteria (D. W. Grogan, personal communication). Using a typical C value of 3.12 to correct for the efficiency of mutation detection (Drake 1991), μg = 0.0005–0.005; using the mid-range values for the above parameters, μg ≈ 0.002. Thus, although careful measurements remain to be performed in this system, the magnitude of μg seems likely to be conserved even in a potentially hypermutagenic environment.

View this table:
TABLE 3

Mutation rates per genome per replication in retroelements

As noted previously, RNA-virus and retrovirus populations are likely to be extinguished when their mutation rates are increased to a few-fold over 1. To be similarly jeopardized, the microbes in Table 4 would have to experience mutation-rate increases on the order of 103-fold. However, they are to some extent buffered against immediate extinction in two ways. First, a substantial fraction of their genes are only infrequently required for growth, particularly under laboratory conditions. Second, diploidy, when it occurs, will protect for a while against the effects of recessive mutations. As described in Drake (1991), Escherichia coli can survive μg ≈ 10 for at least 10 generations (although such cultures contain many dead cells), and Saccharomyces cerevisiae can survive μg ≈ 60 for at least nine generations while diploid, although the haploid segregants are inviable; haploids can survive μg ≈ 2, although most cells grow poorly.

View this table:
TABLE 4

Mutation rates per genome per replication in microbes with DNA chromosomes

The E. coli F plasmid ordinarily replicates in step with the host chromosome, uses most of the same enzymes, and has the same μb as the host (Willetts and Skurray 1987; Drake 1991). Kunz and Glickman (1983) and Christensen et al. (1985) reported that the F mutation rate increases during conjugation, and Taddei et al. (1995) suggested that this might be an example of the μg rule implied in Table 4. In this respect F would resemble phage λ, which displays the host mutation rate as a prophage but the appropriately higher rate when replicating lytically. Unfortunately, while the data of Kunz and Glickman (1983) suggest that the spectrum changes markedly during conjugation, neither F study provides the mutational spectrum needed to calculate μb. Using either the calculations of the authors or calculations based on the methods described in Drake (1991) and noting that base pair substitutions seem to occur preferentially during conjugation (Kunz and Glickman 1983), the conjugational F mutation rate appears to be roughly five-fold to perhaps 20-fold higher than the standard rate. The E. coli genome is about 47-fold larger than the F genome (Willetts and Skurray 1987). Thus, additional measurements are needed to determine whether F follows this μg rule.

Two predictions: A plot of log μb versus log G (Drake 1991) reveals a gap between the viral and the cellular entries. The 578-kb genome of Mycoplasma genitalium (Petersonet al. 1995) falls at the midpoint of this gap. An interpolation from Table 4 predicts μb = 5.9 × 109 for this organism.

Mutation rates in bacteriophage T4 and herpes simplex virus type 1 (HSV) display an intriguing relationship which suggests that at least one strain of HSV may harbor a mutator mutation. These two viruses have similar genome sizes and modes of DNA replication. In HSV stocks grown from small inoculum to N = 108–109 iu, the frequency of herpesvirus tk mutants is about 6.2 × 104 (Hallet al. 1984). The tk gene has about 1150 b (McKnight 1980) and G = 152,260 b (McGeochet al. 1988). No suitable mutational spectrum is available, so C must be guessed by taking the average of the values listed in Drake (1991); then C = 3.12, μb ≈ 1.7 × 107 and μg ≈ 0.026. This μg is roughly eight-fold higher than the values in Table 4. Consider next the antimutator mutations that arise in the DNA polymerase gene of phage T4. These reduce the rates of only certain pathways, while increasing the rates of others; overall, they do not reduce μb (Drake 1993b). A strong, general antimutator is probably difficult or impossible to obtain in one or a few mutational steps (Drake 1993b), the exception being by the reversal of a mutator mutation that itself arose by a single mutation. HSV DNA polymerase mutants selected for resistance to phosphonoacetic acid (PPA) are sometimes antimutators. These reduce the frequency of tk mutants of presumably many kinds (Hallet al. 1984), and therefore presumably do reduce μb; the reduction is roughly 45-fold, giving μg ≈ 0.0006 (about five-fold lower than than the values in Table 4). In contrast, wild-type phage T4 is resistant to PPA; however, T4 DNA-polymerase mutator mutations are sensitive to PPA. When selection is then applied for PPA resistance in these mutator strains, the result is polymerase antimutator mutations that negate the mutator phenotypes (Reha-Krantzet al. 1993). These results suggest that this HSV isolate may carry a naturally occurring mutator mutation. Results described next reveal that this is a reasonable conjecture.

Microbial hypermutation: Microbial mutation rates can increase over short periods for physiological or regulatory reasons, or more permanently by the action of mutator mutations. In addition, particular portions of the genome can be maintained inherently hypervariable through specific, local mechanisms such as the cassette switching that mediates phase variation in bacterial and other pathogens and mating type in yeasts and fungi (Moxonet al. 1994; Sasaki 1994).

Microbial mutation rates can increase physiologically in several ways. Ninio (1991) suggested that errors of transcription, translation, and molecular segregation will create transient mutators which, on his estimate, would contribute modestly to single mutations but strongly to multiple mutations per genome per replication. In Neurospora crassa (Auerbach 1959) and phage T4 (Drake 1966; Drake and Ripley 1994) (and probably in all organisms at various times), resting genomes mutate in a time-dependent, replication-independent fashion because they accumulate spontaneous DNA damage that engenders mutations when DNA replication resumes, and that may even alter transcription to produce a mutant phenotype before replication. Starving bacteria also mutate in a time-dependent manner, one that probably involves immediate DNA synthesis (e.g., Foster 1997; Torkelsonet al. 1997). In addition, DNA damage sometimes elicits the process of translesion bypass, in which DNA primer extension passes an unrepaired lesion. Among microbes, this process can range from fully constitutive, as in phage T4 (Drake and Ripley 1994), to strongly inducible, as in the E. coli SOS response (Walker 1984). The SOS response increases mutation rates for roughly one cell generation, even in undamaged parts of the genome (Walker 1984). Because DNA damage is a byproduct of ordinary endogenous processes such as base depurination and deamination, base damage from by-products of oxygen and methyl metabolism and so on, a few cells in any population are SOS-induced at any time. The fraction of evolutionary change driven by such transient hypermutability remains unknown.

For bacteria, chemostats or daily serial transfers constitute alien environments within which rapid and complex adaptation occurs by mutation and selection. Because at least 10 genes can generate mutator mutations, E. coli populations generate roughly 10−6–10−5 mutator mutants per replication. However, strong mutators are deleterious (Quiñones and Piechocki 1985) and do not accumulate; an ordinary E. coli culture accumulates <10−5 mutator mutants (Mau et al. 1997). On the other hand, mutators can be strongly selected when their frequency and strength are high enough so that they generate more beneficial mutations than do the non-mutators in the same population (Chao and Cox 1983; Maoet al. 1997); the deleterious mutators are selected indirectly along with the adaptive mutation.

In contrast to freshly grown laboratory cultures, mutator mutants are found at frequencies that can exceed 10−2 among hospital isolates of E. coli and Salmonella enterica (Jyssum 1960; Gross and Siegel 1981; LeClercet al. 1996), or after extensive serial passage in the laboratory (Sniegowskiet al. 1997). Mutator mutations are common among commensal as well as pathogenic strains, and may increase mutation rates either strongly or weakly (Maticet al. 1997). Thus, the continual adaptations occurring during bacterial invasions of new hosts or culture conditions suffices to enhance the frequency of mutators by at least 1000-fold, although the subsequent deleterious effects of the mutator mutations may prevent total replacement among hosts or serially transferred lines. In order to understand the roles of mutator mutations in both transitory adaptations and long-term evolution, it will be important to determine the frequencies of mutators in natural populations in both stable and strongly fluctuating environments. Theory describing conditions under which mutators can speed adaptation in asexual microbes (Leigh 1970; Taddeiet al. 1997) encourages such investigations.

Mutation rates in higher eukaryotes based on specific loci: Plants and animals contrast with the organisms in Tables 2, 3 and 4 in several ways. One striking difference is in the amount of DNA. G is one to several orders of magnitude greater in plants and animals than in microbial eukaryotes. Most of the increase is not in functional genes but rather in introns and intergenic regions, so that Ge << G. A second difference is that higher eukaryotes may display important age and sex effects. As we discuss below, in mammals (and especially in humans), the rate of gene mutation per generation is much higher in males and particularly older males, mainly because of the much larger number of germ-line cell divisions ancestral to a sperm than to an egg. A third difference is that mutation rates in animals (and plants) are often equated with the mutant frequency per gamete (or, occasionally, per diploid). Sometimes, however, mutants appear in clusters that reflect the premeiotic expansion of a single event (Muller 1952; Woodruffet al. 1997). Unrecognized clusters are not a problem, because a cluster increases proportionally the probability of finding the mutation. When a cluster is observed, each mutant individual in the cluster should be counted as a mutation when calculating the mutation rate per sexual generation; more complex calculations of within-generation mutation accumulation are difficult because of uncertainties about the topology of normal and mutant cell expansion. A fourth difference is that evolutionary mechanisms for adjusting mutation rates may be quite different in sexual eukaryotes than in rarely sexual microbes because, at least in outbreeding sexual species, the process of meiosis uncouples rate-adjusting mutations from the mutations they engender (Leigh 1970, 1973). Also, sexual reproduction permits the population to rid itself of deleterious mutations more efficiently than is possible in asexual systems (Kimura and Maruyama 1966; Kondrashov 1984, 1988).

In the species we discuss, the data for mutation rates in males are often more extensive and reliable than those for rates in females, so that our calculations frequently must focus on data from males. In all of these species, mutations with small effects tend to go uncounted. Unlike the situation with microbes, where mutational spectra predict the efficiency of detection, the present values are all minimum estimates uncorrected for poorly detected kinds of mutations.

Zea mays: Plants have yielded remarkably few estimates of mutation rates. In plants such as maize where genetic methods are well established, mutation rates are relatively easily estimated by crosses to strains homozygous for mutations causing visible phenotypes. There is wide variation from locus to locus, with a mean of 7.7 × 10−5 and a range of from <0.1 × 10−5 (waxy) to 49.2 × 10−5 (R) mutations per gamete for eight maize loci (Stadler 1930). There are few comparable data from other plants apart from evidence for lower rates in polyploids (Stadler 1929). Lack of further information impedes attempts to extrapolate to the entire genome.

Caenorhabditis elegans: There are about 8.2 cell divisions ancestral to sperm and about 10.0 ancestral to eggs (Kimble and Ward 1988), so we will use the average, 9.1. Mutation rates and numbers of codons have been determined for five loci. Taking the gene sizes as three times the number of codons plus 100 b for regulatory and splicing sequences, using an average C = 3.12 to correct for the efficiency of mutation detection (Drake 1991) and dividing by the 9.1 cell divisions per sexual generation, we obtain μb(unc-22) = 1.17 × 10−11, μb(unc-54) = 1.68 × 10−11, μb(unc-93) = 52.4 × 10−11, μb(unc-105) = 4.54 × 10−11, and μb(sup-10) = 52.6 × 10−11 (Greenwald and Horvitz 1980; Karnet al. 1983; Eide and Anderson 1985; Benianet al. 1989; Levin and Horvitz 1992; Liuet al. 1996; C. White and P. Anderson, GenBank accession no. U43891); the mean μb = 2.25 × 10−10. The total genome size G = 8 × 107 (Sulston and Brenner 1974). There are about 1.78 × 104 genes (Bird 1995); assuming an average of 103 b per gene gives Ge = 1.78 × 107. Then μg = 8 × 107 × 2.25 × 10−10 = 0.018 and μeg = 1.78 × 107 × 2.25 × 10−10 = 0.0040. These values are posted in Table 5.

Drosophila melanogaster: Schalet (1960) detected 51 visible mutations in 490,118 X chromosomes at 13 specific loci, yielding a rate of 8.0 × 10−6 per locus per generation. The fraction of these due to base pair substitutions is unknown; given the evidence that a large fraction of visible mutations in Drosophila are caused by insertions of transposable elements (Finnegan and Fawcett 1986), it is probable that at least half of Schalet's mutations were of this nature. Mukai and Cockerham (1977) enriched the mutation frequency by accumulating isozyme mutations in 1000 chromosomes sheltered by heterozygosity in a balanced-lethal system for almost 175 generations. In 1,658,308 locus-generations they found three electrophoretic-mobility (band-shift) mutations and 17 null (band-loss) mutations. However, these strains exhibited a high rate of chromosome breakage, probably because of an active transposon (Yamaguchi and Mukai 1974); it is therefore probably more realistic to ignore the nulls, a procedure also justified by the high average ratio of base pair substitutions to other mutations in microbes (Drake 1991). Mukai and Cockerham (1977) estimated that about 0.3 of all amino acid changes were detectable as band shifts. In addition, only about 2/3 of base pair substitutions change an amino acid. Thus, the mutation rate per locus per generation is (3/1,658,308)/(0.3)(2/3) = 9.0 × 10−6. Averaging the two studies, we take 8.5 × 10−6 as a representative rate. The proteins studied by Mukai and Cockerham (1977) were encoded by an average of 973 b and some regulatory sequences must also have been present, so dividing by 103 gives 8.5 × 10−9 mutations per b per generation. The number of cell divisions ancestral to a sperm in Drosophila is about 25 for the young males typically used in laboratory experiments (Lindsley and Tokuyasu 1980; Drost and Lee 1995; J. M. Mason, personal communication), so dividing by 25 gives μb = 3.4 × 10−10. In Drosophila, G ≈ 1.7 × 108 b (Ashburner 1989). We will take as Ge the amount of DNA in 1.6 × 104 genes, each of length 103 b (Bird 1995); this gives Ge = 1.6 × 107 b. These and derivative values are given in Table 5.

View this table:
TABLE 5

Mutation rates estimated from specific loci in higher eukaryotes

Mus musculus: The mouse data come from the controls for the extensive radiation experiments performed at Oak Ridge, Harwell and Neuherberg, and summarized by Russell and Russell (1996). A total of 1,485,036 progeny harbored 69 visible mutations at seven loci for a rate of 6.6 × 10−6 per locus per generation. In addition to the complete mutations, about 4.8 × 10−5 mosaic mutations were detected at five loci; these mutations tended to produce about 50% germ-line mosaicism, so that the adjusted mosaic rate is (4.8 × 10−5)(0.5)/5 = 4.8 × 10−6. Thus, the total mutation rate was about 1.1 × 10−5 per locus per generation. Assuming 103 b per locus, we obtain 1.1 × 10−8 mutations per b per generation. Finally, dividing by 62, the estimated number of cell divisions prior to a sperm (Drost and Lee 1995), gives μb = 1.8 × 10−10 mutations per b per cell division. Taking G = 2.7 × 109 b (Laird 1971) and Ge as the amount of DNA required for 8 × 104 genes (Bird 1995) of length 103 bp generates the additional values listed in Table 5.

It is surprising that germ-line mosaics were responsible for almost 40% of the total rate. These appear to arise either as mutations that occur in DNA replications directly before or after meiosis in the male parent (“after” denoting the first zygotic replication) or in a single strand of nonreplicating DNA (which might include mutations arising during DNA recombination or repair). Thus, almost as many mutations may occur in or between one or two special DNA replications as occur in the other 60. This possibility presents a major challenge to both experimentalists and theoreticians (Woodruffet al. 1997).

The mutation rate of evolutionary importance is of course the average over the two sexes. The estimated rates per generation in male and female mice are not very different, but the female value is based on very small numbers and is complicated by a large cluster. Adding the mosaic rate of 4.8 × 10−6 to the female rate of 1.6 × 10−6 gives a rate of 6.4 × 10−6, about half the male rate. Alternatively, we note that the estimated number of cell divisions prior to the gamete is 25 in females and 62 in males, so the female rate is 25/62 = 0.40. Averaging these, the female rate is 0.45 of the male rate. The murine male μegs is 0.55, so the average of the two sexes is about 0.4. A similar result obtains in humans (see below).

Homo sapiens: The human data are less reliable than the C. elegans, Drosophila and mouse data. A number of dominant-mutation rates have been inferred from the frequency of affected children of normal parents, and sometimes confirmed by equilibrium estimates for those dominants with severe effects. These values range from 10−4 to 10−6, with a rough average of 10−5 (Vogel and Motulsky 1997). For genes of size 103 b, this corresponds to a rate of 10−8 per b per generation. An estimate based on specific changes in the hemoglobin molecule gave 0.74 × 10−8 per b per generation (Vogel and Motulsky 1997), but this is clearly an underestimate because other kinds of changes are not included. A third, quite independent estimate is based on rates of evolution of pseudogenes in human ancestry, which are likely to be identical to mutation rates (Kimura 1983a). This gives about 2 × 10−8 per b per generation (Crow 1993, 1995). We shall take 10−8 as a representative value. However, because the overwhelming majority of human mutations occur in males (see below), the male rate must be about twice the average rate, or 2 × 10−8. The number of cell divisions prior to sperm formation in a male of age 30 is about 400 (Drost and Lee 1995; Vogel and Motulsky 1997). Thus, μb ≈ 2 × 10−8/400 = 5 × 10−11. For 8 × 107 genes (Bird 1995) of average size 103 b, μeg ≈ 0.004 and μegs ≈ 1.6.

An alternative method for estimating μegs has been proposed by Kondrashov and Crow (1993) based on the idea that purely neutral sequences such as pseudogenes can be used as a benchmark to identify sites which show clear evidence of selective constraints. If the abundance of such sites can be determined in this way, the effective genome size and its mutation rate can be estimated purely from rates of DNA sequence evolution. This method has yet to be applied to large quantities of sequence data. For hemoglobin genes, about 15% of bases seem to be under the effective control of selection, which may be about average for genes encoding proteins; for a more sophisticated treatment, see Kimura (1983b).

With 6.4 × 109 base pairs in the diploid genome, a mutation rate of 10−8 means that a zygote has 64 new mutations. It is hard to image that so many new deleterious mutations each generation is compatible with life, even with an efficient mechanism for mutation removal. Thus, the great majority of mutations in the noncoding DNA must be neutral.

Effects of sex and age in humans: Data for female mutation rates are less numerous and less reliable than data for male rates. For human base pair substitutions, there is an enormously greater mutation rate in males than in females, along with a strong paternal age effect. Older males have a higher rate than younger males, and the increase is greater than if mutation were simply cell division-dependent (Crow 1993, 1997), but the component of replication fidelity that decreases in older men remains unknown. Mutations also accumulate with cell divisions in somatic tissue (Akiyamaet al. 1995), although whether the rate is proportional to number of divisions is not known.

The enormous difference in human male and female mutation rates is well documented for those few loci for which there are adequate data. For three conditions, Apert's Syndrome, multiple endocrine neoplasia type B (MEN2B), and type A (MEN2A), a total of 92 new mutations have been reported in which, by linkage analysis, it is possible to determine the parent of origin. Strikingly, all 92 were paternal. These are all base-substitution mutations. Apert's Syndrome has also been studied for paternal age effect and, as expected, shows a large increase with paternal age (Crow 1997). The fact that so many of these mutations are at CpG sites offers some support to those who argue that something associated with methylation is responsible for the high male rate (Sapienza 1994; McVean and Hurst 1997).

In contrast, some mutations are not strongly associated with paternal age. Two examples are neurofibromatosis and Duchenne muscular dystrophy. In both of these diseases most of the mutations are small deletions and other cytogenetic changes in enormous genes. Thus, the generalization seems to be that base substitutions are replication-dependent but cytogenetic changes are not. Various human diseases show a continuum from very strong paternal age effect to very little (Rischet al. 1987), presumably reflecting the relative proportions of base substitutions and cytogenetic changes.

Effect of hemizygosity: Deleterious mutations at sex-linked loci are strongly expressed in the hemizygous state in the heterogametic sex and are thus subject to stronger counter-selection than are deleterious mutations in autosomal genes (except when strongly dominant) (Haldane 1927). As a result, there is stronger selection pressure to reduce mutation rates at X-linked loci than at autosomal loci (McVean and Hurst 1997). Data on mammalian DNA sequence evolution suggest that the X chromosome may indeed experience a lower mutation rate than the autosomes (McVean and Hurst 1997), although this may be confounded with a higher mutation rate in males than in females, especially in long-lived mammals.

Mutational hot spotting: Some of the best understood human mutations arise in the gene for achondroplasia, which would seem to be a good source for a mutation rate estimate. The average mutation rate for the phenotype, determined directly in several studies and substantiated by indirect calculations, is 10−5 (Vogel and Motulsky 1997). However, molecular analysis (Shianget al. 1994) revealed that 15 of 16 mutations were GGG → AGG and the other was GGG → CGG at the same codon, replacing glycine with arginine. Thus, the entire observed mutation rate appears to come from one codon. Similar CpG hotspots were responsible for all the mutations causing Apert's Syndrome. Although the data are scanty, these two examples suggest that a major fraction of human gene mutations is due to mutational hot spots, as is also typical in microbes (e.g., Benzer 1961). We badly need more data on per-locus mutation rates accompanied by molecular analyses showing the mutant sites and the parent of origin.

Somatic versus germinal mutation rates: The (mammalian male) germ-line rate may be lower than the somatic rate. Using mice bearing a chromosomal mutation-reporter target, the typical somatic-cell mutant frequency was found to be about 1.7 × 10−5 in a variety of tissues, but about 0.6 × 10−5 in sperm preparations (Kohleret al. 1991). (This three-fold difference is probably an underestimate, because the germ-cell preparations used in these experiments were probably contaminated with somatic cells.) Because the number of cell divisions antecedent to these tissues was not notably lower in germ cells than in somatic cells, this result suggests a mutation rate per cell division (or unit of time) at least three times lower in germ cells than in somatic cells. In turn, this conclusion suggests that evolutionary pressures on mutation rates focus primarily upon the germ line with the soma being subject to less pressure, perhaps because of diploidy (Orr 1995).

Mutation rates for deleterious alleles from mutation-accumulation experiments: Measurements of mutation rates based on specific loci offer the potential of scoring all mutational events in a molecularly well defined target. This has been achieved in several microbial systems, but as yet only imperfectly in higher eukaryotes. A disadvantage of the specific-locus method is that only one or a few, possibly unrepresentative, genes may have been examined. An alternative approach is to accumulate mutations with deleterious fitness effects over many generations. While this method ignores mutations without effects on fitness, it can screen rather large fractions of the genome. Mutations with very small fitness effects (of the order of 10−3 or less) are not likely to contribute to mutation-rate estimates obtained in this way, unless they are improbably numerous; we therefore distinguish these estimates conceptually from the mutation rates per effective genome discussed above. Given that the specific-locus results for higher organisms depend largely on the detection of mutations with major phenotypic effects, the above estimates of the effective genome mutation rates should in practice be quite similar to the deleterious mutation rate estimates.

Mutation-accumulation methods: This procedure combines Muller's (1928) use of marked crossover-suppressing chromosomes to detect mutations anywhere along a chromosome that has been sequestered for several generations, and Bateman's (1959) proposal to use the variance between replicates to estimate the mutation rate and average effect. The approach was refined by Mukai and co-workers (Mukai 1964; Mukaiet al. 1972), and we shall refer to it as the Bateman-Muller-Mukai method.

The basic idea is as follows. A set of initially genetically identical lines is established from an isogenic base. The lines are maintained independently and mutations are allowed to accumulate. Because the mating system assures that the mutation-accumulating chromosomes reside only in heterozygous males and a single male is used as the parent in each generation, selection is minimized. Assuming that the number of mutations per line after some number of generations is a random variable, different lines will accumulate different numbers of mutations. Thus, the variance among lines for a quantitative fitness trait such as viability will increase over time. Given the deleterious nature of most mutations, the mean value of the fitness trait is expected to decline with time. Let the mean number of deleterious mutations that arise per generation be U (Uc for the rate for a particular chromosome, Uh for the haploid genomic rate, Ud for the diploid genomic rate), and the mean reduction in trait value of a single mutation when homozygous (relative to a value of 1 for wild-type) be s¯ . (s¯ is a weighted mean, in which the effects of mutations at individual loci are weighted by the mutation rates at the loci.) If mutational effects are additive across loci, the rates of decline in overall mean fitness (ΔM) and increase in variance among the lines (ΔV) are given by ΔM=Ucs (1a) and ΔV=Uc(s2+Vs), (1b) where Vs is the variance among sites in the effects of mutations (again with a weighting by the mutation rates). These yield the expressions UcΔM2ΔV (2a) and sΔVΔM. (2b)

These are equalities only if all mutations have equal effects on the trait. However, given reliable estimates of the decline in mean and increase in variance, a useful lower bound on the deleterious mutation rate per genome can be obtained from Expression 2a. The true values of the parameters may differ greatly from these bounds. For example, if the mutation effects have an exponential distribution, then the estimate of Uc is doubled and that of s¯ is halved (Crow and Simmons 1983). Under specific assumptions concerning the shape of the distribution of mutational effects, such as a gamma distribution, maximum likelihood methods can be used to estimate the parameters of the distribution and the value of Uc (Keightley 1994, 1996). In principle, these methods should provide more accurate estimates of U, s¯ and Vs than the simpler methods of Mukai et al. (1972) provided that the assumptions of the statistical model are met.

There are two difficulties in applying this method of estimating U in species other than Drosophila. The first is the problem of preventing the operation of selection, which obviously opposes the accumulation of deleterious mutations. This can be fairly easily achieved by maintaining each line with minimal effective population sizes, because a mutation is effectively neutral when the product of effective population size and selection coefficient is less than one (Fisher 1930). In clonal or selfing organisms, propagation of each line through a single individual each generation ensures that even highly deleterious mutations will behave like neutral alleles (Keightley and Caballero 1997). Even for diploids with separate sexes, only strongly deleterious alleles will be efficiently eliminated by selection, because an effective population size of two for each line is possible with full-sib mating. Such experiments therefore provide estimates of the rate of mutation to most detrimental alleles, i.e., mutations that reduce fitness by less than about 50% when homozygous.

View this table:
TABLE 6

Mutation rates to detrimental alleles from mutation-accumulation experiments

The main difficulty with this method is that lines accumulating large numbers of mutations become increasingly vulnerable to loss due to low fitness, introducing a downward bias in both ΔM and ΔV. In D. melanogaster, the use of marked balancer chromosomes means that mutations can accumulate on a single autosome that is propagated through a single heterozygous male in each line (Mukai 1964); then the effective population size is one-half. Given the considerable recessivity of most strongly deleterious mutations (Crow and Simmons 1983), the risk of loss of a line is greatly reduced by this procedure; in fact, in Mukai's experiments only a very small fraction of the singly-mated males were sterile, and those cultures were replaced by sibs so that no lines were lost (Mukaiet al. 1972). The chromosomes accumulating mutations can be made homozygous when desired, and their effects on fitness components assayed.

The mutation rate for recessive lethals can also be estimated by the balancer chromosome technique, and is about 0.01 per haploid genome per generation in D. melanogaster (Crow and Simmons 1983). This seems to be only a small fraction of the total mutation rate to deleterious alleles (see Table 6). More limited information on C. elegans suggests a slightly lower lethal mutation rate of about 0.007 per haploid genome per generation (Rosenbluthet al. 1983; Clarket al. 1988). However, the upper 95% confidence interval for this estimate overlaps the Drosophila value. In plants, direct estimates can be obtained for nuclear mutations to chlorophyll deficiency (albino and yellow seedlings). These are only a component of all lethal mutations, and so provide an underestimate of the mutation rate to lethal alleles (Klekowski 1988). There is also some information from ferns, in which newly arisen lethals affecting the haploid stage can be detected by testing for inviable spores from individual diploid sporophytes (which must necessarily derive from haploids free from lethals in the previous generation). Klekowski (1973) found rates of about 0.01 to 0.015 per haploid genome for three fern species, values remarkably similar to the C. elegans and Drosophila estimates.

A more serious problem concerns the proper control for the estimation of ΔM, because fitness components are notoriously sensitive to environmental effects (Houle 1992). Thus, mutation-accumulation lines should ideally be measured at the same time as a control that has not had the opportunity to accumulate mutations. An isogenic stock that initially has the same genetic composition as the mutation-accumulation lines does not provide a suitable control, even if it is maintained as a random-bred stock with large effective population size as in some experiments (Fernandez and Lopez-Fanjul 1996). This is because most deleterious mutations have only small effects on fitness when heterozygous, and thus can persist in the population for many generations before elimination (Crow and Simmons 1983). The rate of decline in mean fitness of an initially isogenic population as new mutations appear will at first be nearly the same as for the accumulation lines, and will only approach zero after several tens of generations.

This problem of a suitable control can be overcome with organisms that can recover well from freezing, such as E. coli and C. elegans; suitable methods have become available only recently for Drosophila (Stepkonkus and Caldwell 1993). Mukai and co-workers (Mukai 1964; Mukaiet al. 1972) and Ohnishi (1977) either used no control or used values from the lines that were presumed to be mutation-free because they retained maximal fitness. These procedures have led to criticism of the values from these experiments (Keightley 1996).

The magnitude of U: Estimates of minimum (Bateman-Muller-Mukai) detrimental mutation rates from mutation-accumulation experiments in several species are summarized in Table 6. As expected from Table 4, the lower bound for U in E. coli is extremely small, about 6% of the total genomic mutation rate of 0.0034. If the estimated number of cell divisions (36) in the male germ line of Drosophila is multiplied by the E. coli U of 0.0002, the resulting Drosophila Uh = 0.007. This is much less than the measured estimate of about 0.3. This ratio (0.007:0.3) is roughly the same as the ratio of E. coli and Drosophila genome sizes (Kibota and Lynch 1996). In Drosophila, however, the ratio of the effective genome size to the total genome size (Ge:G) is much smaller than in E. coli. If only the ratio of effective genome sizes is considered, a large discrepancy remains. In contrast to the Mukai estimates of Uh for Drosophila of 0.15 (Ohnishi 1977) to 0.42 (Mukaiet al. 1972), Uh for C. elegans was estimated to be 0.003 using the fit to a gamma distribution of mutational effects, or 0.0006 using the Mukai method (Keightley and Caballero 1997). Because the genome sizes of C. elegans and Drosophila are similar, an estimated two orders of magnitude difference in Uh is disturbing. On the other hand, the estimate for Arabidopsis thaliana (0.1) appears to be only slightly lower than for Drosophila, but the confidence intervals on this estimate are very wide (S. Schultz and J. H. Willis, personal communication).

Several possible explanations can be imagined for these discrepancies. One is that the Drosophila estimates are based on measurements of egg-to-adult viability under competitive conditions, whereas the C. elegans and Arabidopsis results were for net reproductive output under noncompetitive conditions; differences in fitness are likely to to be more easily detected under harsher conditions (Kondrashov and Houle 1994). Another possibility is that transposable element insertions played a much larger role in causing deleterious mutations in the Drosophila experiments than in the C. elegans experiments, where the line involved lacked transposable element activity (Keightley and Caballero 1997). A less palatable explanation is that the Drosophila values for ΔM are gross overestimates caused by adaptive changes in the balancer chromosome against which the mutation-accumulation chromosomes were competed; this would cause an artifactual decline in the mean viability of the mutation-accumulation chromosomes relative to the balancer (Keightley 1996). However, this artifact seems unlikely because the balancer chromosomes were long-established laboratory strains that were likely to be at equilibrium, and in the longest accumulation experiments (Mukai 1969), the viability of the balancer chromosomes would have had to double to account for the observed results. The fact that the (probably minimal) effective genomic mutation rate in Drosophila in Table 5 is far closer to the U estimates in Table 6 for Drosophila than for C. elegans further suggests that this putative artefact is improbable.

To avoid the problems of the control for the Drosophila experiments, Garcia-Dorado (1997) used a statistical method for estimating mutational parameters which does not require knowledge of ΔM, but simply fits the observed distribution of mutation-accumulation line values to an assumed form of continuous distribution of mutational effects. This leads to a much smaller estimate of U for the Mukai and Ohnishi experiments (Uh ≈ 0.025) than does the Bateman-Muller-Mukai method, although still substantially larger than the C. elegans value. The difficulty with this result is that there is no a priori justification for the assumed distribution of mutational effects; for example, there could be one class of mutations with similar but fairly large effects, and another class with much smaller but highly variable effects, as suggested by Keightley (1996). This could have substantial effects on the estimate of U. Further research is clearly needed to resolve these uncertainties.

A method that offers a partial solution to the inadequate Drosophila controls has been devised by S. A. Shabalina, L. Y. Yampolsky and A. Kondrashov (personal communication). A large, randomly bred stock is maintained so as to minimize the opportunity for selection on viability and fertility. If this is the case, the mean value of a fitness component should decline at the rate given by Equation 1a, where Uc is replaced by Ud and the selection coefficients are for heterozygotes rather than homozygotes. If a comparable randomly bred stock is maintained under selective conditions that have prevailed for a long time, so that it is at mutation-selection equilibrium, its mean should remain constant except for environmental fluctuations, so that ΔM can be estimated by adjusting for changes in the control. To avoid the possibility of adaptive change in the control stock, the number of generations over which it is maintained may be minimized by keeping it at a low temperature to reduce the number of control generations or using recently developed methods for freezing Drosophila. Their measurements of net fitness under competitive conditions suggested that ΔM ≈ 0.02. For Mukai et al. (1972) the value was 0.004 per chromosome, which translates into 0.02 per diploid genome. These values are in good agreement, but this is perhaps due to opposing errors. The data of Shabalina et al. are for fertility. There is evidence for a more important contribution from fertility than viability to the genetic load for total fitness (Knight and Robertson 1957; Sved 1975; Simmonset al. 1978). On the other hand, they measured heterozygous rather than homozygous effects. In any case, the experiments of Shabalina et al. support a substantial decline in fitness under circumstances where the control strain had been kept frozen, thus arguing against improvement of the reference population in the Mukai experiments.

Indirect estimates of U: Several indirect methods have been proposed for estimating the genomic deleterious mutation rate. For lethal mutations in outcrossing plants, the classical formula for the equilibrium frequency q of a recessive-lethal allele under mutation-selection balance, q ≈ (μ/s)0.5 (Haldane 1927), can be applied to the frequencies of nuclear gene-controlled chlorophyll-deficient variants to obtain the mutation rate μ per haploid genome, assuming that the selection coefficient s = 1 (Crumpacker 1967; Ohnishi 1982; Klekowski 1988). If lethal mutations are not completely recessive, as is suggested by the Drosophila data (Crow and Simmons 1983), this procedure underestimates the mutation rate because a higher value of μ is needed to compensate for the elimination of heterozygous lethals by selection.

The results for total mutation rates to chlorophyll deficiency were reviewed by Klekowski (1992) for ten species of annual plants from five families. He concluded that rates are surprisingly constant, ranging from (0.16–0.45) × 10−3. There is no apparent relation to the species' DNA content, even though this differed more than 20-fold. The differences in Ge are presumably much less. It is difficult to extrapolate these mutation rates to the whole genome, because the proportion of vital loci contributed by chlorophyll genes is presently unknown.

Estimates can also be obtained for populations whose selfing rates are known, using the equilibrium formulae of Ohta and Cockerham (1974). Estimates for such populations of long-lived mangrove species are (2.1–5.8) ×10−3 (Klekowski and Godfrey 1989; Klekowskiet al. 1994), about 13-fold higher than those from annuals. This difference would be expected from an increase in mutation rates with age caused by greater numbers of cell divisions before meiosis in old plants; note that in plants, reproductive structures are formed from vegetative tissues, and are not derived from a germ line (Klekowski 1988). This value needs confirmation in other species. Assuming that mutations affecting sporophyte viability also lower male gametophyte viability, an age effect should also decrease pollen viability in old plants. No such effect was found in the one study of which we are aware (Connor and Lanner 1991). However, by analogy with human studies, mutations that accumulate with age are likely to be mainly base substitutions, which are more likely than deletions to pass through the haploid gametophyte (Khush and Rick 1967).

The use of formulae based on the assumption of mutation-selection equilibria has been extended to estimating detrimental mutation rates from the effect of inbreeding on fitness components (Charlesworthet al. 1990; Charlesworth and Hughes 1998; Deng and Lynch 1996). In highly selfing species, recessive lethals will be quickly purged from the population (Lande and Schemske 1985), and will thus contribute little to the different effects of inbreeding versus outbreeding on fitness components. In addition, such populations will have few polymorphic loci with allelic variability maintained by overdominance (Kimura and Ohta 1971; Charlesworth and Charlesworth 1995). Thus, we can reasonably neglect lethal and segregational loads and assume that heterosis in highly inbred populations is due solely to detrimental mutational load. The rate for independently acting detrimental mutations per diploid genome can then be estimated from the formula Uh=ln(1δ)(12h), (3)where δ is the reduction in mean fitness of highly selfed individuals compared with randomly mated individuals, and h is the dominance coefficient associated with a typical detrimental mutation. This method provides underestimates of total mutation rates for reasonable values of h (Charlesworthet al. 1990).

The range of values of Uh so obtained from data on heterosis in net fitness for several species of highly selfing plants from three families of angiosperms are shown in the top part of Table 7; where no estimate of h is available, it was assumed to be 0.2 as suggested by the Drosophila data (Crow and Simmons 1983; Charlesworth and Hughes 1998). Because each individual estimate is subject to considerable sampling error, it is probably wise to treat only the median value of approximately 0.3 as meaningful. This estimate is ostensibly independent of the strength of selection against the deleterious mutations, and so may capture a larger fraction of them than the mutation-accumulation method. On the other hand, the fact that h seems to be much closer to 0.5 for mutations with small effects than for mutations with drastic effects (Crow and Simmons 1983) means that very weakly selected mutations are likely to contribute relatively little to heterosis and so would not be detected by this method.

Hughes (1995) and Deng and Lynch (1996, 1997) have suggested an extension of this approach to use the genetic variances of inbred and outbred populations to estimate the decrease in fitness components with inbreeding. By simultaneously estimating the degree of inbreeding depression, U and s¯ can also be estimated (Deng and Lynch 1996, 1997). Estimates of Uh applying this method to published data on Drosophila fitness components (B. Charlesworth, unpublished results) and from data on Daphnia (Deng and Lynch 1997) are given in the lower part of Table 7. There is again substantial spread in the individual estimates; the median is approximately 0.34, surprisingly close to the plant value.

While an attractive idea, this method depends heavily on the assumption that mutation-selection balance is the sole force maintaining genetic variation in fitness components. There are good reasons to question this assumption for outbreeding species. In Drosophila, both the genetic variance and inbreeding depression for components of fitness seem to be too high to be explained solely by mutation-selection balance, using the parameter estimates shown in Table 6 (Mukaiet al. 1974; Charlesworth and Hughes 1998). This implies the existence of a substantial contribution from variation which is actively maintained by selection, causing an upward bias in estimates of U by the Deng-Lynch method. The presence of variation maintained by selection means that U is overestimated to an extent which is difficult to gauge.

The impact of increased rates of mutation: Does the high rate of spontaneous deleterious mutation per sexual generation in multicellular organisms render them sensitive to small rate increases, as seen with RNA viruses, retroelements, and mutator strains of E. coli and S. cerevisiae? We would expect both diploidy, and the infrequent demand for the functions of some genes that mediate responses to environmental challenges, to provide considerable protection from recessive lethal mutations, at least until mutations accumulate to an intolerable frequency. But detrimental mutations seem to have higher levels of dominance than lethals (Crow and Simmons 1983), so that a high genomic mutation rate to detrimentals could imperil an outbred population.

View this table:
TABLE 7

Indirect estimates of the detrimental mutation rate

In the 1950s, Wallace (1952, 1956) exposed caged Drosophila populations to continuous radiation accumulating to 250,000 r. The population accumulated a large number of recessive lethals, but its size was not reduced. This is perhaps not surprising in a species with a high reproductive potential. Nevertheless, although heavily mutagenized Drosophila populations showed no overt signs of genetic deterioration, they became weak competitors with nonmutagenized strains (Wijsman 1984). Thus, Wallace's flies were indeed paying a price, but one that would have required a more rigorous environment to reveal.

Recent results with mammals are instructive. Several strains of mice have been rendered homozygous for defects in the mismatch repair genes Msh2 (De Wind et al. 1995; Reitmairet al. 1995), Pms2 (Bakeret al. 1995) or Mlh1 (Bakeret al. 1996; Edelmannet al. 1996). These mice display a mutator phenotype in somatic cells, the mutability of microsatellite sequences being increased by roughly 102-fold. In the case of Pms2, the somatic mutability of an artificial mutational target in a shuttle vector increased about 100-fold throughout the gene (Narayananet al. 1997); most of the mutations were ±1 b, that is, frameshifts that are likely to be null mutations. Most of these mutator mice are superficially healthy, although cancer-prone. Most are sterile, apparently for reasons more mechanical than mutational, but Pms2−/− females are fertile. In humans, samples of normal tissue from several cancer patients were defective in hPMS2 or hMLH1 (the human homologs of murine Pms2 and Mlh1, respectively; see Parsonset al. 1995), suggesting that mutator humans occur naturally and have properties similar to those of mutator mice. These results suggest, as mentioned above, weaker selection against somatic than against germinal mutations.

Thus, substantially increased somatic mutation rates appear to be compatible with mammalian development. However, a persistently high germline mutation rate would be expected to extinguish the population within a few generations; one can easily imagine mouse breeding schemes that could explore the effects of mutation accumulation under highly mutagenic conditions.

Aging is an important aspect of mammalian development. Aging has long been conjectured to reflect the accumulation of somatic mutations (see Finch 1990). The lack of obviously accelerated aging in mice with 100-fold increased somatic mutation rates considerably weakens this hypothesis.

THE EVOLUTION OF MUTATION RATES

Discernible patterns: Rates of spontaneous mutation display several distinct patterns across taxa. RNA-based organisms have the highest genomic rate per genome replication, μg ≈ 1–2 for lytic RNA viruses and μg ≈ 0.1–0.2 for retroviruses and a retrotransposon exclusive of their tenure as parts of host chromosomes. For DNA-based microbes, μg = 0.0034 = 1/300. For higher eukaryotes, the mutation rate is properly expressed per effective genome, which includes only those parts of the genome in which most mutations produce deleterious effects upon which selection can act effectively; based on rather incomplete information, μeg ≈ 0.006 (range 0.004–0.014), these values presently being indistinguishable from 0.003. The corresponding mutation rate per effective genome per sexual generation (μegs) varies by at least 40-fold (0.036–1.6), and the mutation rate per total higher eukaryotic genome must considerably exceed 1 in some cases. The lower bound estimates of rates of deleterious mutation per genome per sexual generation (Uh) are about 0.1–0.35 for Arabidopsis and Drosophila, and Ud therefore approaches or somewhat exceeds 1 for these organisms; it may be lower for C. elegans. As we have discussed, these estimates need to be viewed with caution, and may be substantially revised in the future.

Miscellaneous puzzles and rejoinders: Why do the lytic RNA viruses put up with rates of spontaneous mutation so high as to genetically degrade a substantial fraction of their progeny? One speculation has been that these high rates facilitate escape from immune surveillance and other host defenses. This speculation is faulted by the observation that the RNA-based bacteriophage Qβ and its DNA-based counterpart bacteriophage M13, both infecting E. coli and displaying similar life histories, nevertheless retain the mutation rate characteristic of their class. Another speculation has been that the replication of RNA viruses cannot be more accurate because their replicases lack a proofreading function. However, the retroviruses also lack such a function, but nevertheless achieve a replication accuracy that results in far fewer defective progeny. One general, simple way to increase accuracy would be to decrease the ability of the polymerase to extend from a mismatch, thus aborting mutant progeny. Another, more radical way would be to appropriate a DNA proofreading activity and adapt it to RNA substrates, although this would involve enlarging the genome size and thus increasing the chemical lability of the genome. Yet another speculation is that the benefits from replicating as rapidly as possible outweigh the costs of a high error rate. However, this problem could be solved, as it already has in both prokaryotes and eukaryotes, by employing multiple sites of replication initiation. A final speculation is thus far unfaulted: the retroviral mutation rate is determined primarily by the error rate of transcription, on which the virus cannot improve.

Another characteristic of the RNA viruses, and to a lesser extent the retroviruses, is their extraordinary fecundity. Yields per infected cell of 103 to 104 iu (and perhaps tenfold more physical particles) are common. Thus, there is a substantial probability that an infected cell will release numerous particles that are free of deleterious mutations. A general property of these viruses that may also bear on their mutation rate is the inherent chemical lability of the RNA backbone, which appears to limit RNA virus genomes to <40,000 b. Thus, larger genomes do not persist and therefore cannot experience more intense selection for reduced μb.

Recombination provides a mechanism that may allow a heavily mutagenized population to randomly generate more fit genomes (Pressing and Reanney 1984; Nee 1987). The genome of influenza virus is extensively subdivided and recombines freely, whereas recombination in poliovirus is rare. However, the two viruses have indistinguishable mutation rates, so that recombination rate appears not to be a determinant of mutation rates in this case. Rates of recombination among DNA-based microbes also vary greatly, and thus do not seem to be important in maintaining the strongly invariant μg observed in these organisms.

Two deep differences within and across taxa deserve emphasis here. First, selection on mutation rates must operate differently in organisms with rare or no sex (or with full selfing) than in those sexual organisms with frequent outcrossing, because the products of mutation remain coupled to the rate determinants in the former case but are rapidly decoupled in the latter. Second, two quite different mutation rates operate in multicellular organisms: germline rates and somatic rates. We noted that the murine male germline rate appears to be lower than the somatic rate, suggesting that these two rates evolved somewhat differently. In addition, the remarkable lack of immediate phenotypic consequences of mutator mutations in mice and men (except for higher cancer rates) suggests that in the soma, the buffering provided by diploidy protects strongly against the consequences of mutations (Orr 1995).

Evolutionary forces shaping mutation rates: The different patterns of mutation rates among taxa indicate clearly that the rate of mutation is subject to evolutionary change. Because the fidelity of DNA replication depends on elaborate enzymatic machinery, mutational inactivation of any component of which can greatly elevate the mutation rate, selection acts primarily to reduce the standard mutation rate, although allowing higher rates in specific circumstances. Such selection pressure was first posited by Sturtevant (1937), who observed that the vast majority of spontaneous mutations decrease rather than increase fitness. He suggested that the pressure of deleterious mutations would favor selection of genetic modifiers that reduce the mutation rate and thereby reduce the genetic load of deleterious alleles maintained in the population by mutation-selection balance. This led him to ask why the mutation rate does not fall to zero. Some 30 years later, Kimura (1967) suggested that the cost of continually reducing mutation rates would eventually be balanced by what he called the “physiological cost” of doing so.

Selection for modifiers of mutation rates: More generally, we may now inquire what evolutionary factors determine the mutation rate of a species. While we cannot pretend that this question can yet be fully answered, the main ingredients of an answer are now reasonably clear. A well developed body of theory predicts the effect of selection on a modifier gene that causes a small reduction in the genomic mutation rate to deleterious alleles (Uh or Ud in the terminology introduced above), both for freely recombining sexual populations and for completely asexual populations (Kondrashov 1988, 1995). This reduction comes about because a modifier allele causing a reduction in the mutation rate becomes associated in the population with genomes that have a lowerthan-average number of deleterious mutations. In the case of a freely recombining diploid species, the selection coefficient on a modifier which reduces the diploid mutation rate by δU is approximately s δU, where s is the mean selection coefficient against a heterozygous deleterious mutation. In a diploid asexual species, the selection coefficient is approximately δU. In a completely selfing population, the value is 0.5 δU, provided that deleterious mutations are not completely recessive. These results assume that the populations in questions are close to deterministic equilibrium under mutation and selection, and so do not take into account the presumably numerous class of deleterious mutations whose effects on fitness are of the order of the reciprocal of the effective population size. To take this class of mutations into account, further studies are needed of the dynamics of selection on mutation rate modifiers.

The intuitive reason for this effect of breeding system is as follows. In an asexual or selfing population, the complete linkage of a mutation-rate modifier to its targets means that the selection pressure on the modifier is determined by the difference between the equilibrium mean fitness of clones containing the modifier allele and of clones carrying its rival allele (Leigh 1970, 1973). The equilibrium mean fitness of a diploid asexual population subject to deleterious mutation at a rate Ud is eUd (Haldane 1937; Kimura and Maruyama 1966); thus, the difference in mean fitness between populations carrying the modifier and rival populations is expected to equilibrate at e−(Ud−δUd) − eUd ≈ δUd. With free recombination, on the other hand, a neutral allele remains associated with a mutation that has occurred in the same gamete as itself for an average of only two generations, so that the apparent fitness reduction to the allele due to its initial association with the mutation is 2s (Kimura 1967). If a mutation-rate modifier reduces the diploid genomic mutation rate by δUd, the modifier thus gains an advantage over the rest of the population of approximately s δUd (Kimura 1967; Kondrashov 1995), because only half of the mutations will occur in the same haploid genome as the modifier allele.

It is relatively easy to extend these arguments to include a direct fitness cost to a reduction in the mutation rate. Such a cost prevents the mutation rate from being reduced to zero, for instance because of the energetic costs of diverting cellular resources to proofreading mechanisms (Kimura 1967; Kondrashov 1988, 1995). If such fidelity costs are included in the equations, one can determine the evolutionary equilibrium mutation rate; at this rate, alleles modifying the mutation rate in either direction are neutral or disadvantageous (Kondrashov 1995; McVean and Hurst 1997). While there is experimental evidence for the existence of such costs (Kirkwoodet al. 1986), there is little direct evidence that U is in fact determined in this way. The paucity of evidence in microbes for mutant alleles with clear-cut antimutator effects suggests that mutation rates are in fact near the physico-chemical minimum that can be achieved at an acceptable cost. The increased resistance to radiation damage in Drosophila laboratory populations that have been exposed to radiation for a long time (Nöthel 1987) is one of the few pieces of evidence that might argue the contrary, but even here the spontaneous mutation rate did not seem to be affected. The existence of Drosophila strains with rates of recessive-lethal mutations that differ by 14-fold (Woodruffet al. 1984) shows that not all mutate at the minimum average rate, although the higher Drosophila rates may merely reflect the transitory impact of a transposon infection.

The effect of selection pressure is thus primarily to reduce the genomic mutation rate. The strength of selection on a modifier with a given effect on the mutation rate per base is likely to be proportional to the size of the portion of the genome that produces deleterious mutations, because the same percentage effect of an antimutator gene on the mutation rate per b produces a bigger change in U in larger genomes. δU is thus likely to be proportional to U if the rate of mutation per base is the appropriate scale for measuring effects of mutator genes. If the cost of a given level of reduction in the mutation rate per b is independent of genome size, evolution should result in a rough equality of deleterious mutation rates per genome across species with comparable breeding systems. Such equality is not well supported by the data on RNA viruses in Tables 2 and 3, where the retroviruses have tenfold lower mutation rates per genome generation than the lytic viruses, but is reasonably consistent with the data in Table 4 on DNA-based microbes. The data in Table 5 on higher eukaryotes show fair agreement in the mutation rate per effective genome per replication, but not for the rate per generation as predicted theoretically, with C. elegans being especially diverged from the others (see also Table 6).

A possible explanation for different U values in different taxa is that the cost of fidelity may vary with life history or genome size. The total energetic cost of a given change in fidelity per base per generation is likely be greater in species with larger effective genome sizes or more germ-line cell divisions per generation, so that a complete equality of deleterious mutation rates per genome is unlikely to be achieved. The fact that humans appear to have the highest rate of mutation per effective genome per generation (μegs in Table 5) could be explained by either or both of these effects, among other possibilities.

Sexual versus asexual species: There is clearly much stronger selection in favor of reducing the mutation rate in asexual or selfing organisms than in sexual species. Data from Drosophila suggest that s is of the order of 0.01–0.02 (Crow and Simmons 1983; Charlesworth and Hughes 1998), so that that the difference can easily be as high as two orders of magnitude. If the only factors controlling the fate of modifiers that reduce mutation rates are the fitness advantage of a reduction in mutational load and the cost of increased fidelity of replication, then one might expect to see a much higher mutation rate per genome in sexual compared with asexual or self-fertilizing taxa, after correcting for any differences in effective genome size. But a variety of factors lead asexual or selfing populations to have short evolutionary persistence times (Kondrashov 1993), so that there may not be enough time for them to evolve lower mutation rates. The low mutation rate per effective genome per generation in C. elegans (Table 5) may reflect the high degree of self-fertilization in this species. The fact that effective mutation rates in different species of bacteria and lower eukaryotes are similar (Table 4), despite wide variations in genome size and mode of reproduction, suggest either that differences in breeding system do not in fact matter very much, or that there is enough recombination in ostensibly asexual organisms such as bacteria (Maynard Smith 1991) that the differences in breeding system are more apparent than real. If the latter is the case, then the much higher effective mutation rates in higher eukaryotes (Table 5) can only be explained in the present model by a greatly increased cost of fidelity of replication per generation. Unfortunately, quantitative data on this cost are completely lacking, so that this conclusion remains speculative.

In an asexual species, deleterious genes are eliminated from the population in the same genotypes in which they occur. If mutations occur independently, they are eliminated independently. In a sexual species, deleterious mutations are regrouped every generation, so that it is possible in principle for mutations to be eliminated in groups. Is there a Maxwell's Demon who ensures that each “genetic death” picks off several mutations? Directional epistasis or quasi-truncation selection can have such an effect (Crow 1997), but how effective this is in natural populations is an open question. It is possible that a mutation rate that would lead to extinction in an asexual species would be tolerated in a sexual one.

Adaptive mutations: It has often been suggested that higher mutation rates would be adaptive in populations undergoing strong directional selection, because mutational variability would enhance the speed of response to such a selection pressure. Sturtevant in fact raised this possibility in 1937, only to dismiss it with the phrase, “While this effect may occur, it is difficult to imagine its operation.” Undeterred, Kimura (1960, 1967) developed a theory of selection on the mutation rate, according to which the genetic load experienced by a population from the joint effects of deleterious mutations and the substitution of alleles due to selection in a changing environment is assumed to be minimized by selection on the mutation rate. He showed that the optimal effective genomic mutation rate according to this criterion is equal to the rate of substitution of favorable alleles in the genome as a whole (Kimura 1967). The weakness of this theory for sexually reproducing organisms is that it implicitly assumes the operation of group rather than individual selection, with all the attendant difficulties (Leigh 1970, 1973).

The analysis of models of selection on genes that modify the mutation rate has allowed progress on this problem. As we have seen for the case of mutations with purely deleterious effects, recombination in a sexual species greatly weakens the force of selection on a mutation-rate modifier. This problem is more acute for the case of favorable mutations, because these are likely to be much rarer than deleterious mutations. A modifier allele that increases the mutation rate may thus receive a short-lived boost in frequency from its association with a favored allele that it has induced, but soon loses this advantage as a result of recombination (Kimura 1967; Leigh 1970, 1973). The pressure of selection for an increased mutation rate from the induction of favorable mutations is thus extremely weak in a sexually reproducing organism, and is likely to be overwhelmed by the disadvantage of deleterious mutations.

The situation is somewhat different in species with low levels of recombination, such as many bacteria, because a modifier can remain associated with a mutation that it has induced. In the absence of recombination, Kimura's results apply to the process of selection on favorable mutations (Leigh 1970, 1973). Populations that have experienced a severe challenge from a novel environment might therefore be expected to show an increased frequency of mutator alleles (Taddeiet al. 1997). Several experiments with bacterial populations support this conclusion, and the genetic basis of the increased mutation rate has been identified in some cases (Sniegowskiet al. 1997). This is consistent with the observation cited earlier that some natural isolates of bacteria harbor mutator genes. A mutator allele still faces a long-term problem because it causes a higher load of deleterious alleles, so that once adaptation to the new environment has occurred, selection for a reduction in the mutation rate will be renewed. A mutator strain of bacteria has been observed to evolve a lower mutation rate when grown in a chemostat for over 2,000 generations (Tröbner and Piechocki 1984), presumably as a result of selection of this kind.

One special circumstance in which a high mutation rate is favored is when an organism is confronted with a rapidly cycling or otherwise continually changing environment, so that it pays to be able to respond by producing novel genotypes at loci which are responsible for adaptation to the new state of the environment (Gillespie 1981a; Ishiiet al. 1989). Polymorphisms maintained by selection in a constant environment can also induce selection for increased mutation rates (Gillespie 1981b; Kondrashov 1995). If increased mutational load is to be avoided, then hypermutability should be targeted to specific loci or should be transient. The responses of several pathogens to the host's immune system by antigen switching are an excellent example of this, although they are achieved by special genetic devices rather than conventional mutagenesis (Sasaki 1994).

On the mechanisms of mutation and mutation prevention: Organisms limit their mutation rates by diverse mechanisms. These include metabolic controls over concentrations of endogenous and exogenous mutagens, pre-replication DNA repair systems, the insertion accuracy of polymerases, 3′-exonucleolytic proofreading, and several post-replication systems for repairing mismatches. Different organisms apply different sets of these mechanisms, and the efficiency of a particular mechanism varies among organisms. Sometimes an organism's mutation rate is considered to be “determined” by the particular set of mechanisms it applies. It is more accurate, however, to view that organism's mutation rate as “determined” by deep evolutionary forces, by the life history it has adopted, and by accidents of its evolutionary history. The particular mechanisms employed and their efficiencies are merely devices to carry out the underlying necessity.

Acknowledgments

We thank Phil Anderson, Aurora Garcia-Dorado, Alexey Kondrashov, Brad Preston and John Willis for providing advice and unpublished results. We thank Pat Foster, Chuck Langley, Norm Kaplan, Peter Keightley, Jim Mason and Paul Sniegowski for critical readings of the manuscript.

Footnotes

  • Note added in proof : Howe and Ares (1997) report that yeast introns employ stem-loop structures to bring together the splice junctions at the opposite ends of an intron and thus to increase the probability of correct splicing. If this were a general effect, it might add 102 to 103 b to the mutational target size of a locus. Such an effect would have little or no effect on the values of μeg or μegs in Table 5, but would slightly increase the values of Ge and slightly reduce the values of μb and μg. These changes would be small relative to the present uncertainties of these values in higher eukaryotes.

LITERATURE CITED

View Abstract