Genetics, Vol. 150, 1283-1293, November 1998, Copyright © 1998

Inference of Genome-Wide Mutation Rates and Distributions of Mutation Effects for Fitness Traits: A Simulation Study

Peter D. Keightleya
a Institute of Cell, Animal and Population Biology, University of Edinburgh, Edinburgh EH9 3JT, Scotland, UK

Corresponding author: Peter D. Keightley, Institute of Cell, Animal and Population Biology, University of Edinburgh, W. Mains Rd., Edinburgh EH9 3JT, Scotland., p.keightley{at}ed.ac.uk (E-mail).

Communicating editor: A. G. CLARK


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

The properties and limitations of maximum likelihood (ML) inference of genome-wide mutation rates (U) and parameters of distributions of mutation effects are investigated. Mutation parameters are estimated from simulated experiments in which mutations randomly accumulate in inbred lines. ML produces more accurate estimates than the procedure of Bateman and Mukai and is more robust if the data do not conform to the model assumed. Unbiased ML estimates of the mutation effects distribution parameters can be obtained if a value for U can be assumed, but if U is estimated simultaneously with the distribution parameters, likelihood may increase monotonically as a function of U. If the distribution of mutation effects is leptokurtic, the number of mutation events per line is large, or if genotypic values are poorly estimated, only a lower limit for U, an upper limit for the mean mutation effect, and a lower limit for the kurtosis of the distribution can be given. It is argued that such lower (upper) limits are appropriate minima (maxima). Estimates of the mean mutational effect are unbiased but may convey little about the properties of the distribution if it is leptokurtic.


MUTATIONS that affect fitness are usually deleterious and rarely become fixed in large populations. However, deleterious mutations may occur at a sufficiently high rate to play an important role in several key evolutionary phenomena, such as the evolution and maintenance of sex. Some evolutionary theories require estimates of the genomic rate of deleterious mutations, U, but not necessarily of the distribution of their selective effects [e.g., to test the "deterministic mutation" theory for the evolution of sex (KONDRASHOV 1993 Down)], while others require information both on U and the distribution of selective values [e.g., to infer the probability of "mutational meltdown" (LANDE 1995 Down; LYNCH et al. 1995 Down)].

There are several ways to obtain information on U and distributions of selective values. One approach, proposed by KONDRASHOV and CROW 1993 Down, is to compare between-species nucleotide substitution rates at a sample of regions in the genome to rates in regions evolving unconstrained by natural selection. If selection has operated to remove new mutations in the sampled regions, the rate of nucleotide substitution will be lower than in the unconstrained regions. By combining the relative substitution rates with estimates of between-species divergence times and generation intervals, it is possible, in principle, to estimate U. For example, the rate of nucleotide substitution in human pseudogenes implies that about three point mutations occur in amino acid coding regions per zygote per generation (DRAKE et al. 1998 Down). Protein coding sequences are under moderate constraint, so this rate is similar to the deleterious mutation rate specific to such sequences (A. EYRE-WALKER and P. D. KEIGHTLEY, unpublished results). The estimate refers to fitness-altering mutations occurring on an evolutionary time scale, but does not tell us about the magnitudes of the effects upon which natural selection acted (except that fitness effects were greater than the reciprocal of the effective population size), or about the effects of large-scale molecular changes such as deletions or transposable element (TE) insertions.

A second general approach to indirectly infer U and mutation parameters is from a comparison of the distributions of fitnesses of outbred (or inbred) individuals sampled from a natural population to their inbred (or outbred) progeny. A method developed by DENG and LYNCH 1996 Down for outbreeding populations allows estimation of U and the mean mutant effect and degree of dominance. However, with this method deleterious mutation-selection balance is assumed to be the only mechanism maintaining variation for fitness, and biased estimates will result if any other mechanisms, such as balancing selection or migration, lead to the maintenance of variation (DRAKE et al. 1998 Down). For example, negative genetic correlations between major components of fitness can lead to the maintenance of additive and nonadditive genetic variation for the major components and fitness, respectively (ROSE and CHARLESWORTH 1980 Down; ROSE 1982 Down; FALCONER and MACKAY 1996 Down, chapter 20). In plant species that normally reproduce by selfing, a comparison of the fitnesses of selfed and outcrossed progeny can provide an estimate of U (CHARLESWORTH et al. 1990 Down). As with the method applied to outbreeders, it is assumed that genetic variation is maintained solely by a balance between mutation and selection. CHARLESWORTH et al. 1990 Down method provides estimates for U, but not of mutation distribution parameters such as the mean mutation effect.

A third direct way to study effects of deleterious mutations is the mutation accumulation (MA) approach in the laboratory. Mutations are allowed to randomly accumulate in benign conditions in sublines derived from an inbred base population. The sublines are maintained by close inbreeding or as replicated chromosomes protected by a balancer chromosome, so drift will tend to dominate selection. After many generations of mutation accumulation, fitnesses of the MA lines or chromosomes are compared to controls. The approach was pioneered by MUKAI 1964 Down who used a Cy balancer chromosome as a control to study the viability effects of mutation accumulation in Drosophila melanogaster second chromosomes. In contrast to the indirect approach outlined above, mutation should be the only source of evolutionary divergence between the lines under test, but the method is restricted to laboratory populations. Inference of U and properties of the distribution of mutation effects are made by comparing the distribution of phenotypes of MA and control lines. The method of BATEMAN 1959 Down, which was taken up by MUKAI [1964; the Bateman-Mukai (BM) method], compares the rate of increase in among-MA line variance per generation, Vm, with the rate of change of fitness or fitness component, {Delta}M. Under the assumption that mutations have equal deleterious effects, an estimate of U is obtained from

(1)
and an estimate of the mean mutation effect comes from

(2)
If mutation effects vary, Û is underestimated and Ê(a) is overestimated. This is analogous to the estimation bias for the effective number of factors influencing a quantitative trait (FALCONER and MACKAY 1996 Down, chapter 12). Furthermore, if the distribution of mutation effects is strongly leptokurtic, the mean mutation effect conveys little about the properties of the distribution. Genomes contain sites that vary greatly in functional significance, so the distribution of selective values of new mutations is of central interest.

To infer U and distributions of mutation effects from MA experiments, an alternative approach is to assume that the true distribution of mutation effects follows some family of distributions. Theoretically, the distribution of effects of all mutations that occurred in an experiment could be estimated, but in practice there are insufficient degrees of freedom, so a family of distributions must be assumed. The family is chosen so that changes in its parameters produce distributions with a wide range of properties. Numerical techniques involving maximum likelihood (ML; KEIGHTLEY 1994 Down) or minimum distance (GARCIA-DORADO 1997 Down) have been applied to find values for U and the parameters of the assumed distribution that best fit phenotypic data from the MA experiment. Fit is measured either in terms of {Sigma} log likelihood of data or distance between the fitted and empirical distributions of line means. Following the nearly neutral model (KIMURA 1983 Down; OHTA 1992 Down), gamma distributions of mutation effects have mostly been assumed. The gamma distribution has two parameters, {alpha} and ß, specifying scale and shape, respectively. With small values of ß, the distribution is leptokurtic: most mutations have effects close to zero; larger effects occur with diminishing frequency and contribute most of the between-line variance. Special cases are ß = 1 (the exponential distribution) and ß = (the chi-square distribution with 1 d.f.). Gamma distributions with higher values of ß approach symmetry about the mean (ß/{alpha}). The case of ß -> {infty} corresponds to equal mutation effects, the model that is generally used to obtain BM estimates.

ML or minimum distance methods have been applied to data from several MA experiments (KEIGHTLEY 1994 Down, KEIGHTLEY 1996 Down; GARCIA-DORADO 1997 Down; KEIGHTLEY and CABALLERO 1997 Down; KEIGHTLEY and OHNISHI 1998 Down). The algorithms to fit the mutation parameters have previously relied on highly computer-intensive Monte Carlo methods, and only limited investigations of the performance of the procedures have been possible (KEIGHTLEY 1994 Down; GARCIA-DORADO 1997 Down). Recently, an improved version of the ML procedure has been developed with algorithms ~2 orders of magnitude less demanding of computer time based on numerical integration (KEIGHTLEY and OHNISHI 1998 Down). An extensive study of the properties of ML estimation of mutation parameters is now feasible.

The purpose of this article is to explore the properties and limitations of ML inference of mutation parameters by simulation. Simulated rather than real MA data are analyzed, because the true parameter values are known, the model assumptions are not violated, and replication to detect significant deviations from expectation is possible.


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Model and simulation of data:
The data available for analysis are assumed to be phenotypic means from a set of control lines (generation 0 of the MA experiment) and a set of MA lines (from generation t). In principle, data from several time points could be analyzed simultaneously, and an algorithm has been proposed (KEIGHTLEY 1994 Down). The power of the procedure can also be improved by making use of within-MA line replicate information (J. D. FRY, P. D. KEIGHTLEY, S. L. HEINSOHN and S. V. NUZHDIN, unpublished results). Control line phenotypic values were random normal deviates with mean µ and variance {sigma}2e . Phenotypic values of MA lines were sums of independent random normal deviates, mean µ, variance {sigma}2e as above, plus mutational effects generated by summing n random deviates from a gamma distribution, parameters {alpha} and ß. Epistasis between mutations was not modeled. n was sampled from a Poisson distribution with parameter U (or n = 1 as a special case, where the absolute number of mutation events was assumed to be known). Here, U is the mean number of mutations accumulated per MA line, and would be divided by t to estimate the mutation rate per generation for a real experiment. Following GARCIA-DORADO 1997 Down, to compare simulations with different ß and U, the variance of genotypic values {sigma}2g = was set at a fixed multiplier of the among-line error variance ({sigma}2e ), often 5 or 20. Precision levels (expressed as ) achieved in some previous large-scale MA experiments are, for example, as follows: MUKAI et al. 1972 Down ~7 (one replicate); OHNISHI 1974 Down ~8; KEIGHTLEY and OHNISHI 1998 Down mean 12 for nine traits, range 1.0–56.

ML analysis:
The numerical integration procedure to estimate mutation parameters is fully described elsewhere (KEIGHTLEY and OHNISHI 1998 Down). The same model was assumed as was used to generate the simulated data. The parameters estimated in the model were µ, {sigma}2e , U, {alpha}, and ß. Distributions reflecting about zero, with a parameter P, the proportion of positive mutant effects, have been investigated previously (KEIGHTLEY 1994 Down; KEIGHTLEY and OHNISHI 1998 Down), but are not investigated here. The {alpha} and ß parameters fully specify the properties of the distribution of mutation effects. Traditionally the mean mutational effect has been estimated from MA experiments. If the distribution is leptokurtic, it is most logical to estimate {alpha} and ß, but the mean mutant effect is likely to be of continued interest, so ß and E(a) = ß/{alpha} are given here. Parameter estimates are based on "profile likelihoods," computed by maximizing the likelihood for a series of fixed values of one parameter. Likelihood surfaces often become very flat, so maximization could fail if all parameters are fitted simultaneously. ML estimates were obtained from profile likelihoods, typically involving 10–20 points, by fitting a quadratic curve to the highest likelihood point and the nearest points on either side. Tests in which additional points were added about the ML did not significantly change the results. C computer code to carry out the likelihood calculations is available on request.


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Comparison to the Bateman-Mukai method with shape of the distribution of mutation effects assumed:
In principle, the shape of the distribution of mutation effects will always need to be estimated. However, it is useful to compare the fit of models with different fixed values of ß to explore the behavior of the ML procedure. By assuming equal mutation effects in the ML analysis (ß -> {infty}), the performance of the ML and BM procedures can also be compared. Table 1 and Table 2 show means and standard deviations (SDs) for estimates of U and E(a) from 30 replicate simulations (1000 replicates for ß -> {infty}). Two values of U were simulated, with equal mutation effects, or a gamma distribution with ß = 0.5 (corresponding to a strongly leptokurtic distribution). The genetic variance was 20{sigma}2e (Table 1) or 5{sigma}2e (Table 2), and 200 control and MA lines were simulated. The data for each replicate were analyzed by ML as described above, and by the BM method (Equation 1 and Equation 2), with Vm estimated as the difference between the among-MA line and control line variances. Table 1 and Table 2 illustrate a number of interesting results: (1) When the data conform to the model assumed (i.e., simulated and assumed ß are the same), the ML and BM procedures give mean estimates very close to the simulated values. ML provides good mean parameter estimates if the model conforms with the data irrespective of the shape of the distribution, i.e., the mean log L is highest for the ß value corresponding to the simulated distribution. (2) If a model corresponding exactly to the data is assumed, ML provides more accurate estimates than BM (i.e., coefficients of variation for the estimates are lower). This effect is particularly apparent for the case of few mutations with equal effects measured with low error (Table 1), presumably because the MA line data tend to fall into discrete classes. (3) If the model does not correspond to the date (e.g., ß = 0.5 simulated, but ß -> {infty} assumed), ML provides mean parameter estimates closer to the values simulated than BM. ML is therefore more robust to deviations from the true distribution than BM. (4) For the U values simulated, ML can distinguish better between distributions if there are few mutations per MA line. (5) ML can distinguish better between different distributions if the true distribution is platykurtic (i.e., with ß -> {infty} simulated, the average change in log L between fitted distributions is very large). If the true distribution is leptokurtic, little information can be obtained on ß, beyond inferring that the model of equal effects gives a poor fit.


 
View this table:
In this window
In a new window

 
Table 1. Comparison of BM and ML procedures in which fixed values of ß (ß*) are assumed, with = 20


 
View this table:
In this window
In a new window

 
Table 2. Comparison of BM and ML procedures with = 5

Number of mutation events known:
In certain experimental situations the number of mutation events per genome is known. In Drosophila, mobilization of P elements has been used to generate lines with single independent insertions (LYMAN et al. 1996 Down), and in Escherichia coli, strains with fixed numbers of Tn10 insertions have been generated (ELENA and LENSKI 1997 Down; ELENA et al. 1998 Down). Effects of TE insertion on the distribution of quantitative trait phenotypes can then be measured relative to nonmutagenized control lines. Analysis of simulated data with exactly one mutation per MA line showed that mean parameter estimates are very close to the values simulated, with no directional bias, and that the mean mutation effect is estimated with considerably higher accuracy than the distribution shape parameter (data not shown).

There are certain experimental situations where the expected rather than absolute number of mutation events can be estimated. For example, rates of accumulation of spontaneous TE insertions and base pair substitutions can be used to indirectly estimate the per genome mutation rate in Drosophila, albeit imprecisely (KEIGHTLEY 1994 Down; DRAKE et al. 1998 Down). Table 3 compares estimates of ß and E(a) for simulations with "low" (U = 0.5) and "high" (U = 5) numbers of mutations. The number of mutation events per line was Poisson distributed. In the analysis the expected number was fitted as a known parameter. Mean estimates of ß are close to the simulated values, but there is a slight but consistent upward bias. The bias, which is explored more fully in the next sections, implies that the parameters in the model are confounded with one another. There does not appear to be any directional bias for estimates of E(a), which is again estimated much more precisely than ß.


 
View this table:
In this window
In a new window

 
Table 3. Parameter estimates from analysis of simulated data sets with the expected number of mutation events (U*) fitted as a fixed parameter

Unknown numbers of mutation events, "favorable" data:
Highly precise measurement of genotypic values is unrealistic in an experimental setting, but should be favorable for disentangling the parameters. To model such a situation, 30 data sets were independently generated with the among-line variance of genotypic values, {sigma}2g = 100{sigma}2e . Two hundred control and MA lines were simulated with U = 2.5 and ß = 1. Table 4 shows mean ML parameter estimates along with mean lower and upper support limits based on loge likelihood drops of 2 from the MLs (asymptotically approximately equivalent to 95% confidence limits). Mean ML estimates agree reasonably closely with their simulated parameter values. Frequency distributions of the estimates (Figure 1) show, however, that distributions are skewed upward in the case of Û, and downward in the case of Ê(a). The presence of skew turns out to be a consistent feature where U is estimated as an unknown parameter. A small number of Û are very much larger than the simulated value, while Ê(a) for the corresponding simulations are very much smaller than their true values. For such data sets, the tend to be close to zero. An explanation for this behavior can be found in the moments of the distribution of genotypic values.



View larger version (20K):
In this window
In a new window
Download PPT slide
 
Figure 1. Frequency distributions of ML estimates of mutation parameters from the favorable data sets. Simulated parameter values are indicated by "{wedge}."


 
View this table:
In this window
In a new window

 
Table 4. Analysis of simulated data where simulated values are favorable.

The wide range for the mean lower and upper support limits for Û and (Table 4) implies wide confidence intervals, even for the "favorable" parameter values. The upper support limit for Û -> {infty} because in 4 of the 30 simulations, log likelihood leveled out at a value <2 different from the ML. The criterion for the confidence interval is a drop in log likelihood of 2, and is based on asymptotic properties, which, presumably, are not being met. In these 4 pathological cases, no upper limit for U can be given. In the same 4 simulation runs, the lower support limits for both and Ê(a) approached zero, so only upper limits can be given. The data would be consistent, therefore, with an extremely high mutation rate, with effects sampled from a strongly leptokurtic distribution having a mean effect close to zero.

Behavior of the moments of the distribution of genotypic values:
Previous investigations of the ML procedure have shown that estimates of ß and U tend to be strongly confounded with one another (KEIGHTLEY 1994 Down; KEIGHTLEY and OHNISHI 1998 Down). Likelihood often becomes flat if the fitted value of U is increased while, for constant {alpha}, ß is simultaneously decreased. Since E(a) = ß/{alpha} the estimated mean mutant effect also decreases as the fitted value of U increases. The correlation between the parameters generates characteristically shaped profile likelihoods. Profile likelihoods for one data set investigated in the previous section are shown in Figure 2. In Figure 2A (the likelihood profile for U), the ML is close to the U value of 2.5 simulated, but log likelihood quickly becomes flat as a function of increasing U. This behavior can be explained as follows: The moments of the distribution of genotypic values, X, are given by

(3)



View larger version (20K):
In this window
In a new window
Download PPT slide
 
Figure 2. Profile likelihoods as a function of U, ß, and E(a) for one of the favorable data sets. The stimulated parameter values are indicated by "{wedge}."

The first four moments are therefore

(4)

(5)

(6)

(7)

Under ML the moments of the fitted distribution will be close to those of the empirical distribution. The mean of the fitted distribution, E(X), is the most important constraint determining the fit. With a fixed E(X) and a large value of U, the higher order moments can be held constant by increasing U while adjusting ß downward in a compensatory manner. The important terms are 1/Uß, 2/U2ß2, 6/U3ß3, etc., in which U and B are of the same order in the denominator, while the terms where U is of higher order than ß become small (i.e., 1/U, 1/U2, 3/U2ß, etc.). The behavior of the moments of the genotypic distribution implies that increasing U can eventually always lead to an essentially unchanging fitted distribution, and to flat profile likelihoods.

With the favorable simulated data sets, there were always maxima in the likelihood profiles. However, less favorable situations, such as a strongly leptokurtic distribution of mutation effects, a large number of mutations per MA line, or poorly estimated genotypic values, often give no likelihood maxima, so only lower or upper support limits for the parameters can be given.

U unknown, more realistic data:
To investigate ML with more realistic data, simulations in which {sigma}2g = 20{sigma}2e were analyzed. Mean ML estimates of U (Table 5) are of the correct order, and individually nonsignificantly different from the values simulated, but there appears to be a general upward bias in the estimated values. The bias is more serious than noted for simulations with U fixed. Individual simulation runs usually allow upper support limits for U to be obtained, but each set included at least one with an upper support limit -> {infty}. Lower support limits for U tend to be closer to the simulated values for platykurtic distributions. Table 5 shows that information on the shape of the distribution of mutation effects is difficult to obtain. In some cases, likelihood increased monotonically as a function of ß, hence the mean ML estimate -> {infty}. In contrast to the other parameters, estimates of E(a) seem to be unbiased, but individual runs give lower support limits for E(a) of zero. Figure 3 and Figure 4 compare results for mutation rates of 0.5 and 5, respectively, under a platykurtic distribution of mutation effects (ß = 2). The lower U gives steeper likelihood profiles, so more information on the mutation parameters can be obtained. As U increases, the distribution of genotypic values will become increasingly normal, presumably increasing the difficulty in disentangling the parameters.



View larger version (31K):
In this window
In a new window
Download PPT slide
 
Figure 3. Likelihood profiles as a function of U, ß, and E(a) for simulations with U set at 0.5 and ß at 2. The value of {alpha} was adjusted such that the variance of genotypic values was 20{sigma}2e. The simulated parameter values are indicated by vertical broken lines.



View larger version (25K):
In this window
In a new window
Download PPT slide
 
Figure 4. As shown in Figure 3 with the simulated value of U set at 5 and ß at 2.


 
View this table:
In this window
In a new window

 
Table 5. Effect of changes in U and ß on parameter value estimates

Trade-off between number of lines and precision for individual lines?
To investigate the effect of varying the number of lines (N), while maintaining the same total "effort" in the experiment, simulation runs were compared for constant N x {sigma}2g (Table 6). Two levels of experimental precision were investigated: 50 or 200 lines with {sigma}2g set to 80 or 20, respectively (the upper two sets of parameters in Table 6), or the same numbers of lines with {sigma}2g set to 20 or 5, respectively (the lower two sets of parameters in Table 6). An exponential distribution of mutation effects with U = 2.5 was simulated. The results suggest that a higher number of lines with reduced effort per line leads to proportionally narrower bounds for E(a), and fewer runs in which minimum, ML, or maximum parameter estimates were 0 or {infty}. Table 6 also illustrates that it is generally possible only to obtain lower support limits for U and ß, and an upper limit for E(a) (cf. Table 5).


 
View this table:
In this window
In a new window

 
Table 6. Effect of change in number of lines measured (N) on parameter value estimates


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

The development of methods to infer distributions of mutation effects has been motivated by the crucial importance of these distributions for evolutionary models. This article has investigated ML as an approach to obtain estimates of U and mutation distribution parameters. One key parameter estimated by ML, but not estimated by the BM procedure, is a distribution shape parameter. The ML procedure behaves well if the number of mutation events per MA line is known, but if U has to be estimated simultaneously with the distribution parameters, the mutation parameters become strongly confounded with one another, and profile likelihoods tend to be flat and asymmetrical about their maxima. Often, ML estimates for U -> {infty} and ß -> 0. The flatness of the profile likelihoods can be explained by the behavior of the moments of the distribution of genotypic values. The moments contain terms in the product ß x U that can be held constant by making compensatory changes in ß and U. By assuming asymptotic properties of likelihood (i.e., large sample size), lower support limits for U can be obtained (and sometimes upper limits as well), but the tendency for profile likelihoods to be asymmetrical suggests that the asymptotic properties are not being met. Further investigation of this aspect is needed.

By assuming a specific distribution of mutation effects with a fixed shape, much of the difficulty in disentangling the remaining parameters in the model disappears, and point estimates for U and E(a) can be obtained (Table 1 and Table 2). An alternative to ML, the BM method, generally assumes that mutation effects are equal, and produces point estimates from the rate of change of mean and variance between MA and control lines. If the true distribution of mutation effects is gamma, the BM method underestimates U by a factor 1 + 1/ß, and overestimates E(a) by the same factor:

(8)

(9)

The BM method does not provide an obvious method to compare the fit of different distributions to the data, so the simplest model (i.e., equal effects) needs to be assumed. By employing the BM method with bootstrapping to obtain a confidence interval (HOULE et al. 1992 Down), lower and upper limits for point estimates of U and E(a) can be obtained. These are asymptotically equivalent to the lower or upper support limits obtained by ML based on the properties of likelihood, if equal mutation effects are assumed. The lower (or upper) support limits for U and E(a) obtained by ML are not equivalent in a statistical sense to the point estimates obtained by the BM method. With ML there may be no point estimate because likelihood often increases monotonically as a function of one parameter. A difficulty with the BM method to estimate U is the denominator of Equation 1, Vm, which could approach zero (hence ÛBM -> {infty}) if the experiment had low power. The problem is illustrated in Figure 5, where BM estimates and corresponding coefficients of variation (CVs) for U are shown for a range of simulated values. If Vm is poorly estimated (corresponding to the lowest simulated values of U in the figure), estimates can become meaningless because the CV -> {infty}. A similar effect also occurs with BM estimation of E(a), if {Delta}X could plausibly be zero (Equation 2). Other properties of estimation of mutation parameters by the BM method have been investigated by DENG and FU 1998 Down.



View larger version (17K):
In this window
In a new window
Download PPT slide
 
Figure 5. Ratio of estimated to simulated mutation rate [U(est)/U(sim)] and coefficient of variation of estimate [CV(U)] plotted against simulated mutation rate. Estimates were obtained by the BM method from analysis of 100,000 simulated data sets of 100 MA and control lines, with five replicates in each line. The between-line variance was computed by an ANOVA. Mutant effects were 0.2, and the between-replicate environmental variance was 1.0.

In principle, parameters that can be reliably estimated by the MA approach with inbred sublines are the rates of change of mean and variance from fixation of mutations (CROW and SIMMONS 1983 Down). Two experiments have estimated rates of change of mean fitness in a recently caught outbred population of Drosophila maintained in the laboratory (GILLIGAN et al. 1997 Down; SHABALINA et al. 1997 Down), and apparently produced conflicting results, but it has been argued that effects of genetic adaptation, inbreeding depression, and an accumulation of deleterious mutations cannot be distinguished in outbred populations (KEIGHTLEY et al. 1998 Down). The meaning of the point (or minimum) estimates of U by the BM or ML methods must also be born in mind. They are conditional on the form of the distribution of mutation effects assumed, equal for BM, and gamma for ML in the present case. Mutations with tiny, but evolutionarily important effects, could make up a substantial fraction of the genomic deleterious mutation rate, but a laboratory experiment cannot hope to detect these by measuring their "pressure" on the population mean phenotype or their contribution to the mutational variance. The genomic mutation rate estimate, if it is a point estimate, must for this reason also be a minimum estimate.

The strong correlation between the parameter estimates implies that global maximization of likelihood is often problematical when all parameters are fitted simultaneously. This problem can be overcome by fixing one parameter and generating profile likelihoods. The properties of the multidimensional likelihood surface can be explored in this way, and it is recommended that this is done for analysis of real data. A further potential problem with very small simulated U values and high {sigma}2g relative to {sigma}2e (i.e., very large mutation effects) is that likelihood profiles may have multiple peaks. The present version of the procedure makes use of information from a "base population" (generation 0) and from generation t. In principle, intermediate generations of data could also be utilized and may add greatly to the power to distinguish the different mutation events that occurred in the experiment. (Note that the present simulations suggest that fewer generations of mutation accumulation can give more precise estimates, but there will be an optimum, which will depend on the true mutation rate and distribution parameters.) T. BATAILLON (personal communication) has devised an ML method assuming equal mutation effects to obtain estimates of U and E(a) with more than one generation, and found increased power from doing so.

The confounded nature of the mutation distribution parameters can be largely overcome if the number of mutation events is known. Artificial induction of independent random TE insertions is one way to generate MA lines with known numbers of mutation events. Fitness or quantitative trait assays have allowed estimates to be obtained for mean fitness effects of insertion in Drosophila (EANES et al. 1988 Down; MACKAY et al. 1992 Down) and in E. coli (ELENA et al. 1998 Down). The latter study has shown that the distribution of fitness effects of single elements is discontinuous, with a preponderance of roughly equivalent effects of ~3%. A discontinuous distribution of mutation effects was also inferred by reanalysis of mutation accumulation experiments involving balancers in Drosophila, and it was argued that TE mutations could be implicated (KEIGHTLEY 1996 Down).

Recently, minimum distance (MD) methods to infer U and mutation distribution parameters have been proposed in which data from a base population (generation 0) are not included in the analysis (GARCIA-DORADO 1997 Down). Instead, all the information comes from the MA lines themselves, with an estimate of the between-MA line variance coming from an ANOVA. Reanalysis of single generations of data of MUKAI et al. 1972 Down and OHNISHI 1977 Down, with the assumption of a gamma distribution of mutation effects, produced point estimates of U. These estimates, which did not take into account the large apparent changes of mean viability of the MA lines, were about one order of magnitude smaller than the original BM estimates, which used information from the change of mean viability estimated by regression from several generations of data. However, an ML analysis of OHNISHI's (1977) data with the same model as described by GARCIA-DORADO 1997 Down does not provide a point estimate; instead, the ML estimate of U -> {infty} (data not shown). This difference between ML and MD is probably due to the fact that ML takes into account variation in the true population mean, while MD does not. An ML analysis of data from MUKAI et al. 1972 Down produces a similar point estimate of U as inferred by GARCIA-DORADO 1997 Down if the same model is assumed [about 10 times smaller than by MUKAI et al. 1972 Down original analysis], but again, the ML analysis gives an estimate of U -> {infty} if a mixed distribution of mutation effects (e.g., gamma + normal) is assumed (data not shown). GARCIA-DORADO's (1997) analysis of data from the MA experiment of FERNANDEZ and LOPEZ-FANJUL 1996 Down included information on the ancestral population mean viability, so the estimates are, presumably, less sensitive to the model assumed.

The analysis presented here suggests that very large MA experiments can give some insight into genomic mutation rates and distributions of mutation effects. Small-scale experiments will have difficulty in detecting significant mutation-induced changes in mean or variance, and estimates of the underlying mutation parameters will therefore have little meaning. Analysis of the results of MA experiments by ML allows the fit of different distributions of effects to be compared, and some kinds of distributions may be rejected. ML lower bounds (or support limits) are appropriate minimum estimates of U. Estimates of distribution parameters are often unbounded. The critical problem is that the estimates depend on the model assumed for the distribution of mutation effects and may not account for mutations with small, but biologically important, effects. Analysis of DNA sequence data in which estimates of U are obtained by comparing the level of constraint in different parts of the genome (KONDRASHOV and CROW 1993 Down) may ultimately be more informative, although a quantum leap in the understanding of the functional significance of noncoding DNA will be needed.


*  ACKNOWLEDGMENTS

I thank Thomas Bataillon, Brian Charlesworth, Esther Davies, Jim Fry, Mike Lynch, Andy Peters, Stuart West, Alexey Kondrashov, and an anonymous reviewer for helpful comments, and the Royal Society for support.

Manuscript received April 15, 1998; Accepted for publication July 20, 1998.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

BATEMAN, A. J., 1959  The viability of near-normal irradiated chromosomes. Int. J. Radiat. Biol. 1:170-180.

CHARLESWORTH, B., D. CHARLESWORTH, and M. T. MORGAN, 1990  Genetic loads and estimates of mutation rates in highly inbred plant populations. Nature 347:380-382.

CROW, J. F., and M. J. SIMMONS, 1983 The mutation load in Drosophila, pp. 1–35 in The Genetics and Biology of Drosophila, Vol. 3C, edited by M. ASHBURNER, H. L. CARSON and J. N. THOMPSON. Academic Press, London.

DENG, H.-W. and M. LYNCH, 1996  Estimation of deleterious mutation parameters in natural populations. Genetics 144:349-360[Abstract].

DENG, H.-W. and Y.-X. FU, 1998  On the three methods for estimating deleterious mutation parameters. Genet. Res. 71:223-236[Medline].

DRAKE, J. W., B. CHARLESWORTH, D. CHARLESWORTH, and J. F. CROW, 1998  Rates of spontaneous mutation. Genetics 148:1667-1686[Abstract/Free Full Text].

EANES, W. F., C. WESLEY, J. HEY, D. HOULE, and J. W. AJIOKA, 1988  The fitness consequences of P element insertion in Drosophila melanogaster.. Genet. Res. 52:17-26.

ELENA, S. F. and R. E. LENSKI, 1997  Test of synergistic interactions among deleterious mutations in bacteria. Nature 390:395-398[Medline].

ELENA, S. F., L. EKUNWE, N. HAJELA, S. A. ODEN, and R. E. LENSKI, 1998  Distribution of fitness effects caused by random insertion mutations in Escherichia coli.. Genetica 102(103):349-358.

FALCONER, D. S., and T. F. C. MACKAY, 1996 Introduction to Quantitative Genetics, Ed. 4. Longman Scientific and Technical, Harlow, Essex, UK.

FERNANDEZ, J. and C. LOPEZ-FANJUL, 1996  Spontaneous mutational variances and covariances for fitness-related traits in Drosophila melanogaster.. Genetics 143:829-837[Abstract].

GARCIA-DORADO, A., 1997  The rate and effects distribution of viability mutation in Drosophila: minimum distance estimation. Evolution 51:1130-1139.

GILLIGAN, D. M., L. M. WOODWORTH, M. E. MONTGOMERY, D. A. BRISCOE, and R. FRANKHAM, 1997  Is mutation accumulation a threat to the survival of endangered populations? Conserv. Biol. 11:1235-1241.

HOULE, D., D. K. HOFFMASTER, S. ASSIMACOPOULOS, and B. CHARLESWORTH, 1992  The genomic mutation rate for fitness in Drosophila.. Nature 359:58-60[Medline].

KEIGHTLEY, P. D., 1994  The distribution of mutation effects on viability in Drosophila melanogaster.. Genetics 138:1315-1322[Abstract].

KEIGHTLEY, P. D., 1996  Nature of deleterious mutation load in Drosophila. Genetics 144:1993-1999[Abstract].

KEIGHTLEY, P. D. and A. CABALLERO, 1997  Genomic mutation rates for lifetime reproductive output and lifespan in Caenorhabditis elegans.. Proc. Natl. Acad. Sci. USA 94:3823-3827[Abstract/Free Full Text].

KEIGHTLEY, P. D. and O. OHNISHI, 1998  EMS induced polygenic mutation rates for nine quantitative characters in Drosophila melanogaster.. Genetics 148:753-766[Abstract/Free Full Text].

KEIGHTLEY, P. D., A. CABALLERO, and A. GARCIA-DORADO, 1998  Surviving under mutation pressure. Curr. Biol. 8:R235-237[Medline].

KIMURA, M., 1983 The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge, UK.

KONDRASHOV, A. S., 1993  Classification of hypotheses on the advantage of amphimixis. J. Hered. 84:372-387[Abstract/Free Full Text].

KONDRASHOV, A. S. and J. F. CROW, 1993  A molecular approach to estimating the human deleterious mutation rate. Hum. Mutat. 2:229-234[Medline].

LANDE, R., 1995  Mutation and conservation. Conserv. Biol. 9:782-791.

LYMAN, R. F., F. LAWRENCE, S. V. NUZHDIN, and T. F. C. MACKAY, 1996  Effects of single P-element insertions on bristle number and viability in Drosophila melanogaster.. Genetics 143:277-292[Abstract].

LYNCH, M., J. CONERY, and R. BURGER, 1995  Mutation accumulation and the extinction of small populations. Am. Nat. 146:489-518.

MACKAY, T. F. C., R. LYMAN, and M. S. JACKSON, 1992  Effects of P element insertions on quantitative traits in Drosophila melanogaster.. Genetics 130:315-332[Abstract].

MUKAI, T., 1964  The genetic structure of natural populations of Drosophila melanogaster. I. Spontaneous mutation rate of polygenes controlling viability. Genetics 50:1-19[Free Full Text].

MUKAI, T., S. I. CHIGUSA, L. E. METTLER, and J. F. CROW, 1972  Mutation rate and dominance of genes affecting viability in Drosophila melanogaster.. Genetics 72:333-355.

OHNISHI, O., 1974 Spontaneous and ethyl methanesulfonate-induced mutations controlling viability in Drosophila melanogaster. Ph.D. Thesis, University of Wisconsin.

OHNISHI, O., 1977  Spontaneous and ethyl methanesulfonate-induced mutations controlling viability in Drosophila melanogaster. II. Homozygous effect of polygenic mutations. Genetics 87:529-545[Abstract/Free Full Text].

OHTA, T., 1992  The nearly neutral theory of molecular evolution. Annu. Rev. Ecol. Syst. 23:263-286.

ROSE, M., 1982  Antagonistic pleiotropy, dominance, and genetic variation. Heredity 48:63-78.

ROSE, M. and B. CHARLESWORTH, 1980  A test of evolutionary theories of senescence. Nature 287:141-142[Medline].

SHABALINA, S. A., L. Y. YAMPOLSKY, and A. S. KONDRASHOV, 1997  Rapid decline of fitness in panmictic populations of Drosophila melanogaster maintained under relaxed natural selection. Proc. Natl. Acad. Sci. USA 94:13034-13039[Abstract/Free Full Text].