Abstract
The properties and limitations of maximum likelihood (ML) inference of genomewide mutation rates (U) and parameters of distributions of mutation effects are investigated. Mutation parameters are estimated from simulated experiments in which mutations randomly accumulate in inbred lines. ML produces more accurate estimates than the procedure of Bateman and Mukai and is more robust if the data do not conform to the model assumed. Unbiased ML estimates of the mutation effects distribution parameters can be obtained if a value for U can be assumed, but if U is estimated simultaneously with the distribution parameters, likelihood may increase monotonically as a function of U. If the distribution of mutation effects is leptokurtic, the number of mutation events per line is large, or if genotypic values are poorly estimated, only a lower limit for U, an upper limit for the mean mutation effect, and a lower limit for the kurtosis of the distribution can be given. It is argued that such lower (upper) limits are appropriate minima (maxima). Estimates of the mean mutational effect are unbiased but may convey little about the properties of the distribution if it is leptokurtic.
MUTATIONS that affect fitness are usually deleterious and rarely become fixed in large populations. However, deleterious mutations may occur at a sufficiently high rate to play an important role in several key evolutionary phenomena, such as the evolution and maintenance of sex. Some evolutionary theories require estimates of the genomic rate of deleterious mutations, U, but not necessarily of the distribution of their selective effects [e.g., to test the “deterministic mutation” theory for the evolution of sex (Kondrashov 1993)], while others require information both on U and the distribution of selective values [e.g., to infer the probability of “mutational meltdown” (Lande 1995; Lynchet al. 1995)].
There are several ways to obtain information on U and distributions of selective values. One approach, proposed by Kondrashov and Crow (1993), is to compare betweenspecies nucleotide substitution rates at a sample of regions in the genome to rates in regions evolving unconstrained by natural selection. If selection has operated to remove new mutations in the sampled regions, the rate of nucleotide substitution will be lower than in the unconstrained regions. By combining the relative substitution rates with estimates of betweenspecies divergence times and generation intervals, it is possible, in principle, to estimate U. For example, the rate of nucleotide substitution in human pseudogenes implies that about three point mutations occur in amino acid coding regions per zygote per generation (Drakeet al. 1998). Protein coding sequences are under moderate constraint, so this rate is similar to the deleterious mutation rate specific to such sequences (A. EyreWalker and P. D. Keightley, unpublished results). The estimate refers to fitnessaltering mutations occurring on an evolutionary time scale, but does not tell us about the magnitudes of the effects upon which natural selection acted (except that fitness effects were greater than the reciprocal of the effective population size), or about the effects of largescale molecular changes such as deletions or transposable element (TE) insertions.
A second general approach to indirectly infer U and mutation parameters is from a comparison of the distributions of fitnesses of outbred (or inbred) individuals sampled from a natural population to their inbred (or outbred) progeny. A method developed by Deng and Lynch (1996) for outbreeding populations allows estimation of U and the mean mutant effect and degree of dominance. However, with this method deleterious mutationselection balance is assumed to be the only mechanism maintaining variation for fitness, and biased estimates will result if any other mechanisms, such as balancing selection or migration, lead to the maintenance of variation (Drakeet al. 1998). For example, negative genetic correlations between major components of fitness can lead to the maintenance of additive and nonadditive genetic variation for the major components and fitness, respectively (Rose and Charlesworth 1980; Rose 1982; Falconer and Mackay 1996, chapter 20). In plant species that normally reproduce by selfing, a comparison of the fitnesses of selfed and outcrossed progeny can provide an estimate of U (Charlesworthet al. 1990). As with the method applied to outbreeders, it is assumed that genetic variation is maintained solely by a balance between mutation and selection. Charlesworth et al.’s (1990) method provides estimates for U, but not of mutation distribution parameters such as the mean mutation effect.
A third direct way to study effects of deleterious mutations is the mutation accumulation (MA) approach in the laboratory. Mutations are allowed to randomly accumulate in benign conditions in sublines derived from an inbred base population. The sublines are maintained by close inbreeding or as replicated chromosomes protected by a balancer chromosome, so drift will tend to dominate selection. After many generations of mutation accumulation, fitnesses of the MA lines or chromosomes are compared to controls. The approach was pioneered by Mukai (1964) who used a Cy balancer chromosome as a control to study the viability effects of mutation accumulation in Drosophila melanogaster second chromosomes. In contrast to the indirect approach outlined above, mutation should be the only source of evolutionary divergence between the lines under test, but the method is restricted to laboratory populations. Inference of U and properties of the distribution of mutation effects are made by comparing the distribution of phenotypes of MA and control lines. The method of Bateman (1959), which was taken up by Mukai [1964; the BatemanMukai (BM) method], compares the rate of increase in amongMA line variance per generation, V_{m}, with the rate of change of fitness or fitness component, ΔM. Under the assumption that mutations have equal deleterious effects, an estimate of U is obtained from
To infer U and distributions of mutation effects from MA experiments, an alternative approach is to assume that the true distribution of mutation effects follows some family of distributions. Theoretically, the distribution of effects of all mutations that occurred in an experiment could be estimated, but in practice there are insufficient degrees of freedom, so a family of distributions must be assumed. The family is chosen so that changes in its parameters produce distributions with a wide range of properties. Numerical techniques involving maximum likelihood (ML; Keightley 1994) or minimum distance (GarciaDorado 1997) have been applied to find values for U and the parameters of the assumed distribution that best fit phenotypic data from the MA experiment. Fit is measured either in terms of R log likelihood of data or distance between the fitted and empirical distributions of line means. Following the nearly neutral model (Kimura 1983; Ohta 1992), gamma distributions of mutation effects have mostly been assumed. The gamma distribution has two parameters, α and β, specifying scale and shape, respectively. With small values of β, the distribution is leptokurtic: most mutations have effects close to zero; larger effects occur with diminishing frequency and contribute most of the betweenline variance. Special cases are β = 1 (the exponential distribution) and β = ^{1}/_{2} (the chisquare distribution with 1 d.f.). Gamma distributions with higher values of β approach symmetry about the mean (β/α). The case of β → ∞ corresponds to equal mutation effects, the model that is generally used to obtain BM estimates.
ML or minimum distance methods have been applied to data from several MA experiments (Keightley 1994, 1996; GarciaDorado 1997; Keightley and Caballero 1997; Keightley and Ohnishi 1998). The algorithms to fit the mutation parameters have previously relied on highly computerintensive Monte Carlo methods, and only limited investigations of the performance of the procedures have been possible (Keightley 1994; GarciaDorado 1997). Recently, an improved version of the ML procedure has been developed with algorithms ∼2 orders of magnitude less demanding of computer time based on numerical integration (Keightley and Ohnishi 1998). An extensive study of the properties of ML estimation of mutation parameters is now feasible.
The purpose of this article is to explore the properties and limitations of ML inference of mutation parameters by simulation. Simulated rather than real MA data are analyzed, because the true parameter values are known, the model assumptions are not violated, and replication to detect significant deviations from expectation is possible.
MATERIALS AND METHODS
Model and simulation of data: The data available for analysis are assumed to be phenotypic means from a set of control lines (generation 0 of the MA experiment) and a set of MA lines (from generation t). In principle, data from several time points could be analyzed simultaneously, and an algorithm has been proposed (Keightley 1994). The power of the procedure can also be improved by making use of withinMA line replicate information (J. D. Fry, P. D. Keightley, S. L. Heinsohn and S. V. Nuzhdin, unpublished results). Control line phenotypic values were random normal deviates with mean μ and variance σ^{2}_{e}. Phenotypic values of MA lines were sums of independent random normal deviates, mean μ, variance σ^{2}_{e} as above, plus mutational effects generated by summing n random deviates from a gamma distribution, parameters α and β. Epistasis between mutations was not modeled. n was sampled from a Poisson distribution with parameter U (or n = 1 as a special case, where the absolute number of mutation events was assumed to be known). Here, U is the mean number of mutations accumulated per MA line, and would be divided by t to estimate the mutation rate per generation for a real experiment. Following GarciaDorado (1997), to compare simulations with different β and U, the variance of genotypic values σ^{2}_{g} = Uβ(β + 1)/α^{2} was set at a fixed multiplier of the amongline error variance (σ^{2}_{e}), often 5 or 20. Precision levels (expressed as σ^{2}_{g}/σ^{2}_{e}) achieved in some previous largescale MA experiments are, for example, as follows: Mukai et al. (1972) ∼7 (one replicate); Ohnishi (1974) ∼8; Keightley and Ohnishi (1998) mean 12 for nine traits, range 1.056.
ML analysis: The numerical integration procedure to estimate mutation parameters is fully described elsewhere (Keightley and Ohnishi 1998). The same model was assumed as was used to generate the simulated data. The parameters estimated in the model were μ, σ^{2}_{e}, U, α, and β. Distributions reflecting about zero, with a parameter P, the proportion of positive mutant effects, have been investigated previously (Keightley 1994; Keightley and Ohnishi 1998), but are not investigated here. The α and β parameters fully specify the properties of the distribution of mutation effects. Traditionally the mean mutational effect has been estimated from MA experiments. If the distribution is leptokurtic, it is most logical to estimate α and β, but the mean mutant effect is likely to be of continued interest, so β and E(a) = β/α are given here. Parameter estimates are based on “profile likelihoods,” computed by maximizing the likelihood for a series of fixed values of one parameter. Likelihood surfaces often become very flat, so maximization could fail if all parameters are fitted simultaneously. ML estimates were obtained from profile likelihoods, typically involving 1020 points, by fitting a quadratic curve to the highest likelihood point and the nearest points on either side. Tests in which additional points were added about the ML did not significantly change the results. C computer code to carry out the likelihood calculations is available on request.
RESULTS
Comparison to the BatemanMukai method with shape of the distribution of mutation effects assumed: In principle, the shape of the distribution of mutation effects will always need to be estimated. However, it is useful to compare the fit of models with different fixed values of β to explore the behavior of the ML procedure. By assuming equal mutation effects in the ML analysis (β → ∞), the performance of the ML and BM procedures can also be compared. Tables 1 and 2 show means and standard deviations (SDs) for estimates of U and E(a) from 30 replicate simulations (1000 replicates for β → ∞). Two values of U were simulated, with equal mutation effects, or a gamma distribution with β = 0.5 (corresponding to a strongly leptokurtic distribution). The genetic variance was 20σ^{2}_{e} (Table 1) or 5σ^{2}_{e} (Table 2), and 200 control and MA lines were simulated. The data for each replicate were analyzed by ML as described above, and by the BM method (Equations 1 and 2), with V_{m} estimated as the difference between the amongMA line and control line variances. Tables 1 and 2 illustrate a number of interesting results: (1) When the data conform to the model assumed (i.e., simulated and assumed β are the same), the ML and BM procedures give mean estimates very close to the simulated values. ML provides good mean parameter estimates if the model conforms with the data irrespective of the shape of the distribution, i.e., the mean log L is highest for the β value corresponding to the simulated distribution. (2) If a model corresponding exactly to the data is assumed, ML provides more accurate estimates than BM (i.e., coefficients of variation for the estimates are lower). This effect is particularly apparent for the case of few mutations with equal effects measured with low error (Table 1), presumably because the MA line data tend to fall into discrete classes. (3) If the model does not correspond to the date (e.g., β = 0.5 simulated, but β → ∞ assumed), ML provides mean parameter estimates closer to the values simulated than BM. ML is therefore more robust to deviations from the true distribution than BM. (4) For the U values simulated, ML can distinguish better between distributions if there are few mutations per MA line. (5) ML can distinguish better between different distributions if the true distribution is platykurtic (i.e., with β ∞ simulated, the average change in log L between fitted → distributions is very large). If the true distribution is leptokurtic, little information can be obtained on β, beyond inferring that the model of equal effects gives a poor fit.
Number of mutation events known: In certain experimental situations the number of mutation events per genome is known. In Drosophila, mobilization of P elements has been used to generate lines with single independent insertions (Lymanet al. 1996), and in Escherichia coli, strains with fixed numbers of Tn10 insertions have been generated (Elena and Lenski 1997; Elenaet al. 1998). Effects of TE insertion on the distribution of quantitative trait phenotypes can then be measured relative to nonmutagenized control lines. Analysis of simulated data with exactly one mutation per MA line showed that mean parameter estimates are very close to the values simulated, with no directional bias, and that the mean mutation effect is estimated with considerably higher accuracy than the distribution shape parameter (data not shown).
There are certain experimental situations where the expected rather than absolute number of mutation events can be estimated. For example, rates of accumulation of spontaneous TE insertions and base pair substitutions can be used to indirectly estimate the per genome mutation rate in Drosophila, albeit imprecisely (Keightley 1994; Drakeet al. 1998). Table 3 compares estimates of β and E(a) for simulations with “low” (U = 0.5) and “high” (U = 5) numbers of mutations. The number of mutation events per line was Poisson distributed. In the analysis the expected number was fitted as a known parameter. Mean estimates of β are close to the simulated values, but there is a slight but consistent upward bias. The bias, which is explored more fully in the next sections, implies that the parameters in the model are confounded with one another. There does not appear to be any directional bias for estimates of E(a), which is again estimated much more precisely than β.
Unknown numbers of mutation events, “favorable” data: Highly precise measurement of genotypic values is unrealistic in an experimental setting, but should be favorable for disentangling the parameters. To model such a situation, 30 data sets were independently generated with the amongline variance of genotypic values,
The wide range for the mean lower and upper support limits forÛ and
Behavior of the moments of the distribution of genotypic values: Previous investigations of the ML procedure have shown that estimates of β and U tend to be strongly confounded with one another (Keightley 1994; Keightley and Ohnishi 1998). Likelihood often becomes flat if the fitted value of U is increased while, for constant α, β is simultaneously decreased. Since E(a) = β/α the estimated mean mutant effect also decreases as the fitted value of U increases. The correlation between the parameters generates characteristically shaped profile likelihoods. Profile likelihoods for one data set investigated in the previous section are shown in Figure 2. In Figure 2a (the likelihood profile for U), the ML is close to the U value of 2.5 simulated, but log likelihood quickly becomes flat as a function of increasing U. This behavior can be explained as follows: The moments of the distribution of genotypic values, X, are given by
Under ML the moments of the fitted distribution will be close to those of the empirical distribution. The mean of the fitted distribution, E(X), is the most important constraint determining the fit. With a fixed E(X) and a large value of U, the higher order moments can be held constant by increasing U while adjusting β downward in a compensatory manner. The important terms are 1/Uβ, 2/U ^{2}β^{2}, 6/U ^{3}β^{3}, etc., in which U and B are of the same order in the denominator, while the terms where U is of higher order than β become small (i.e., 1/U, 1/U ^{2}, 3/U ^{2}β, etc.). The behavior of the moments of the genotypic distribution implies that increasing U can eventually always lead to an essentially unchanging fitted distribution, and to flat profile likelihoods.
With the favorable simulated data sets, there were always maxima in the likelihood profiles. However, less favorable situations, such as a strongly leptokurtic distribution of mutation effects, a large number of mutations per MA line, or poorly estimated genotypic values, often give no likelihood maxima, so only lower or upper support limits for the parameters can be given.
U unknown, more realistic data: To investigate ML with more realistic data, simulations in which σ^{2}_{g} = 20σ^{2}_{e} were analyzed. Mean ML estimates of U (Table 5) are of the correct order, and individually nonsignificantly different from the values simulated, but there appears to be a general upward bias in the estimated values. The bias is more serious than noted for simulations with U fixed. Individual simulation runs usually allow upper support limits for U to be obtained, but each set included at least one with an upper support limit → ∞. Lower support limits for U tend to be closer to the simulated values for platykurtic distributions. Table 5 shows that information on the shape of the distribution of mutation effects is difficult to obtain. In some cases, likelihood increased monotonically as a function of β, hence the mean ML estimate → ∞. In contrast to the other parameters, estimates of E(a) seem to be unbiased, but individual runs give lower support limits for E(a) of zero. Figures 3 and 4 compare results for mutation rates of 0.5 and 5, respectively, under a platykurtic distribution of mutation effects (β = 2). The lower U gives steeper likelihood profiles, so more information on the mutation parameters can be obtained. As U increases, the distribution of genotypic values will become increasingly normal, presumably increasing the difficulty in disentangling the parameters.
Tradeoff between number of lines and precision for individual lines? To investigate the effect of varying the number of lines (N), while maintaining the same total “effort” in the experiment, simulation runs were compared for constant N × σ^{2}_{g} (Table 6). Two levels of experimental precision were investigated: 50 or 200 lines with σ^{2}_{g} set to 80 or 20, respectively (the upper two sets of parameters in Table 6), or the same numbers of lines with σ^{2}_{g} set to 20 or 5, respectively (the lower two sets of parameters in Table 6). An exponential distribution of mutation effects with U = 2.5 was simulated. The results suggest that a higher number of lines with reduced effort per line leads to proportionally narrower bounds for E(a), and fewer runs in which minimum, ML, or maximum parameter estimates were 0 or ∞. Table 6 also illustrates that it is generally possible only to obtain lower support limits for U and β, and an upper limit for E(a) (cf. Table 5).
DISCUSSION
The development of methods to infer distributions of mutation effects has been motivated by the crucial importance of these distributions for evolutionary models. This article has investigated ML as an approach to obtain estimates of U and mutation distribution parameters. One key parameter estimated by ML, but not estimated by the BM procedure, is a distribution shape parameter. The ML procedure behaves well if the number of mutation events per MA line is known, but if U has to be estimated simultaneously with the distribution parameters, the mutation parameters become strongly confounded with one another, and profile likelihoods tend to be flat and asymmetrical about their maxima. Often, ML estimates for U → ∞ and β 0. The flatness of the profile likelihoods can be explained by the behavior of the moments of the distribution of genotypic values. The moments contain terms in the product β × U that can be held constant by making compensatory changes in β and U. By assuming asymptotic properties of likelihood (i.e., large sample size), lower support limits for U can be obtained (and sometimes upper limits as well), but the tendency for profile likelihoods to be asymmetrical suggests that the asymptotic properties are not being met. Further investigation of this aspect is needed.
By assuming a specific distribution of mutation effects with a fixed shape, much of the difficulty in disentangling the remaining parameters in the model disappears, and point estimates for U and E(a) can be obtained (Tables 1 and 2). An alternative to ML, the BM method, generally assumes that mutation effects are equal, and produces point estimates from the rate of change of mean and variance between MA and control lines. If the true distribution of mutation effects is gamma, the BM method underestimates U by a factor 1 + ^{1}/_{β}, and overestimates E(a) by the same factor:
In principle, parameters that can be reliably estimated by the MA approach with inbred sublines are the rates of change of mean and variance from fixation of mutations (Crow and Simmons 1983). Two experiments have estimated rates of change of mean fitness in a recently caught outbred population of Drosophila maintained in the laboratory (Gilliganet al. 1997; Shabalinaet al. 1997), and apparently produced conflicting results, but it has been argued that effects of genetic adaptation, inbreeding depression, and an accumulation of deleterious mutations cannot be distinguished in outbred populations (Keightleyet al. 1998). The meaning of the point (or minimum) estimates of U by the BM or ML methods must also be born in mind. They are conditional on the form of the distribution of mutation effects assumed, equal for BM, and gamma for ML in the present case. Mutations with tiny, but evolutionarily important effects, could make up a substantial fraction of the genomic deleterious mutation rate, but a laboratory experiment cannot hope to detect these by measuring their “pressure” on the population mean phenotype or their contribution to the mutational variance. The genomic mutation rate estimate, if it is a point estimate, must for this reason also be a minimum estimate.
The strong correlation between the parameter estimates implies that global maximization of likelihood is often problematical when all parameters are fitted simultaneously. This problem can be overcome by fixing one parameter and generating profile likelihoods. The properties of the multidimensional likelihood surface can be explored in this way, and it is recommended that this is done for analysis of real data. A further potential problem with very small simulated U values and high
The confounded nature of the mutation distribution parameters can be largely overcome if the number of mutation events is known. Artificial induction of independent random TE insertions is one way to generate MA lines with known numbers of mutation events. Fitness or quantitative trait assays have allowed estimates to be obtained for mean fitness effects of insertion in Drosophila (Eaneset al. 1988; Mackayet al. 1992) and in E. coli (Elenaet al. 1998). The latter study has shown that the distribution of fitness effects of single elements is discontinuous, with a preponderance of roughly equivalent effects of ∼3%. A discontinuous distribution of mutation effects was also inferred by reanalysis of mutation accumulation experiments involving balancers in Drosophila, and it was argued that TE mutations could be implicated (Keightley 1996).
Recently, minimum distance (MD) methods to infer U and mutation distribution parameters have been proposed in which data from a base population (generation 0) are not included in the analysis (GarciaDorado 1997). Instead, all the information comes from the MA lines themselves, with an estimate of the betweenMA line variance coming from an ANOVA. Reanalysis of single generations of data of Mukai et al. (1972) and Ohnishi (1977), with the assumption of a gamma distribution of mutation effects, produced point estimates of U. These estimates, which did not take into account the large apparent changes of mean viability of the MA lines, were about one order of magnitude smaller than the original BM estimates, which used information from the change of mean viability estimated by regression from several generations of data. However, an ML analysis of Ohnishi’s (1977) data with the same model as described by GarciaDorado (1997) does not provide a point estimate; instead, the ML estimate of U ∞ (data not shown). This difference between ML and→MD is probably due to the fact that ML takes into account variation in the true population mean, while MD does not. An ML analysis of data from Mukai et al. (1972) produces a similar point estimate of U as inferred by GarciaDorado (1997) if the same model is assumed [about 10 times smaller than by Mukai et al.’s (1972) original analysis], but again, the ML analysis gives an estimate of U → ∞ if a mixed distribution of mutation effects (e.g., gamma + normal) is assumed (data not shown). GarciaDorado’s (1997) analysis of data from the MA experiment of Fernandez and LopezFanjul (1996) included information on the ancestral population mean viability, so the estimates are, presumably, less sensitive to the model assumed.
The analysis presented here suggests that very large MA experiments can give some insight into genomic mutation rates and distributions of mutation effects. Smallscale experiments will have difficulty in detecting significant mutationinduced changes in mean or variance, and estimates of the underlying mutation parameters will therefore have little meaning. Analysis of the results of MA experiments by ML allows the fit of different distributions of effects to be compared, and some kinds of distributions may be rejected. ML lower bounds (or support limits) are appropriate minimum estimates of U. Estimates of distribution parameters are often unbounded. The critical problem is that the estimates depend on the model assumed for the distribution of mutation effects and may not account for mutations with small, but biologically important, effects. Analysis of DNA sequence data in which estimates of U are obtained by comparing the level of constraint in different parts of the genome (Kondrashov and Crow 1993) may ultimately be more informative, although a quantum leap in the understanding of the functional significance of noncoding DNA will be needed.
Acknowledgments
I thank Thomas Bataillon, Brian Charlesworth, Esther Davies, Jim Fry, Mike Lynch, Andy Peters, Stuart West, Alexey Kondrashov, and an anonymous reviewer for helpful comments, and the Royal Society for support.
Footnotes

Communicating editor: A. G. Clark
 Received April 15, 1998.
 Accepted July 20, 1998.
 Copyright © 1998 by the Genetics Society of America