Abstract
Deng and Lynch recently proposed estimating the rate and effects of deleterious genomic mutations from changes in the mean and genetic variance of fitness upon selfing/outcrossing in outcrossing/highly selfing populations. The utility of our original estimation approach is limited in outcrossing populations, since selfing may not always be feasible. Here we extend the approach to any form of inbreeding in outcrossing populations. By simulations, the statistical properties of the estimation under a common form of inbreeding (sib mating) are investigated under a range of biologically plausible situations. The efficiencies of different degrees of inbreeding and two different experimental designs of estimation are also investigated. We found that estimation using the total genetic variation in the inbred generation is generally more efficient than employing the genetic variation among the mean of inbred families, and that higher degree of inbreeding employed in experiments yields higher power for estimation. The simulation results of the magnitude and direction of estimation bias under variable or epistatic mutation effects may provide a basis for accurate inferences of deleterious mutations. Simulations accounting for environmental variance of fitness suggest that, under fullsib mating, our extension can achieve reasonably well an estimation with sample sizes of only ∼20003000.
THE genome of any organism is subject to continuous bombardment of mutations, the majority of which are deleterious. Numerous theories based on the deleterious genomic mutations have been developed to explain some fundamental phenomena in biology. The validity of these theories critically depends on the rate at which deleterious mutations occur per genome per generation (U) and/or the effects of deleterious mutations.
For example, estimates of U are crucial to testing theories for the evolution of sex and recombination (Muller 1964; Kondrashov 1985, 1988; Charlesworth 1990), mate choice (Charlesworth and Charlesworth 1987; Kondrashov 1988; Kirkpatrick and Ryan 1991), outbreeding mechanisms (Charlesworth and Charlesworth 1987), diploidy (Kondrashov and Crow 1991), and the accelerated extinction rate of small populations (Lynch and Gabriel 1990; Lynch et al. 1993, 1995a,b). Estimates of the other parameters of spontaneous deleterious mutations are also important. Such parameters include the mean dominance coefficient (h) the mean selection coefficient (s) the genomic mutation variance scaled by environmental variance (V_{m}/V_{e}), and variation of mutation effects. Estimates of h and s are important for testing the theories of evolutionary transition from haploidy to diploidy (Perrotet al. 1991) and for testing theories of the role of deleterious mutations in extinction of small populations (Lande 1994; Lynchet al. 1995a). Joint estimates of U, h, and s determine the rate of input of genetic variance from mutation per generation (Deng and Lynch 1996, 1997) and the extent to which the neutral molecular variation is reduced due to the background selection (Charlesworth et al. 1993, 1995; Hudson and Kaplan 1995). In finite populations, variation of mutation effects plays an important role in the maintenance of polygenic variations (Keightley and Hill 1990) and in determining the persistence time and extinction rate of small populations (Lande 1994; Lynch et al. 1993, 1995a,b).
However, few estimates are available (Crow and Simmons 1983; Kondrashov 1988; Crow 1993), and none of the current estimation approaches can yield unbiased results under realistic situations (Deng and Fu 1998a). A direct estimation approach for bounds of U and s, the traditional mutationaccumulation experiment (Bateman 1959; Mukaiet al. 1972), takes extensive time and labor and is feasible only for asexual organisms, special sexual organisms (such as Drosophila where special chromosomal constructs are available), and artificially constructed purely inbred lines. An indirect procedure for estimating U makes use of inbreeding depression data (Mortonet al. 1956; Charlesworthet al. 1990), but it depends on an unknown h of deleterious mutations (h needed is the harmonic/arithmetic mean in outcrossing/selfing populations; Deng and Lynch 1996). Estimation of h requires more assumptions (Comstock and Robinson 1948; Hayman 1954; Mukaiet al. 1972; Caballeroet al. 1997; Deng 1998). Even with the additional assumptions, the estimate is biased and weighted by the selection coefficients of individual mutant alleles at different loci. In addition, this indirect estimation of U is very sensitive to an assumed (or estimated) h (Deng and Fu 1998a).
Deng and Lynch (1996, 1997) developed an estimation approach, making better use of the data (changes of both the mean and genetic variance for fitness traits) that can be acquired from selfing/outbreeding in outcrossing/highly selfing populations, to estimate not only U, but also h, s, and V_{m}, etc. (Deng and Lynch 1996, 1997). All the estimation approaches applied to natural populations (Mortonet al. 1956; Charlesworthet al. 1990; Deng and Lynch 1996) assume that all the genetic variation is maintained by mutationselection balance (see discussion). Under a range of biologically plausible situations investigated, Deng and Lynch's (1996) estimation approach almost always yields the best estimation (as reflected by the mean square error, a composite index of bias and sampling variance), when compared with other estimation approaches (Deng and Fu 1998a). However, Deng and Lynch (1996, 1997) were mainly concerned with the estimation in outcrossing populations such as those of Daphnia (Deng 1995), in which selfing is feasible. Although approximate equations for fullsib mating were developed (Equation 14; Deng and Lynch 1996), they are just a rough approximation and thus inaccurate. Applying these approximate estimations to data from fullsib mating experiments, substantial bias will result even under the ideal assumptions underlying the derivation for estimation (H.W. Deng, unpublished results). Under the necessary assumptions in Deng and Lynch (1996), development of the exact estimation for inbreeding experiments other than selfing is not trivial, and thus was not developed (though the effort was made as evidenced by the approximate estimations given there) in Deng and Lynch (1996). Since selfing is not feasible for the majority of outcrossing populations, there is an imperative need to develop estimation procedures with other forms of inbreeding (such as full and halfsib mating, etc.) that are feasible in almost all outcrossing populations.
In this article we develop exact estimation equations, under the assumptions in Deng and Lynch (1996), for any form of inbreeding to characterize deleterious genomic mutations. Additionally, we investigate and compare the statistical properties and the robustness of the estimation under common forms of inbreeding (full and halfsib mating) and those under selfing. Furthermore, estimation employing the total genetic variation in the inbred generation and that using the genetic variance among the mean of inbred families are compared for their relative efficiencies (multiple inbred progeny from each family is not necessary in the former estimation, while it is mandatory in the latter). These investigations will not only provide a guideline for designing an efficient experiment employing samples from outcrossing populations, but will also provide a basis for accurate inferences of deleterious genomic mutations from the values estimated on the basis of some necessary but unrealistic assumptions.
THEORY
The assumptions of Morton et al. (1956), Charlesworth et al. (1990), and Deng and Lynch (1996, 1997) are employed to derive analytical estimations. The population is assumed to be very large for a long enough time such that all loci are at mutationselection equilibrium for segregating polymorphisms. For any locus, this requires that selective disadvantage of deleterious mutations (selection coefficient s) is on the order of 1/N_{e}, where N_{e} is the effective population size (Kimuraet al. 1963; Lynch et al. 1995a,b). This requirement ensures that, besides mutations, selection rather than random genetic drift is the driving force for the standing genetic variation. The frequency of deleterious mutant alleles at any locus is assumed to be small. The fitness function is assumed to be multiplicative, which is biologically plausible (Mortonet al. 1956; Crow 1986; Craddocket al. 1995; Fu and Ritland 1996). Mutations at each locus have constant effects s and h (dominance coefficient). The three genotypic values at each locus with mutations are, respectively,
Under these assumptions, the mean (W_{O}) and the genetic variance [σ^{2}_{w}(O)] of fitness in an outcrossing population are found to be, respectively (Deng and Lynch 1996),
Equation 1 is a wellknown result (Haldane 1937; Kimuraet al. 1963; Burger and Hofbauer 1994). W_{max} is the expected fitness of a mutationfree genotype in the environmental conditions, where the experimental measurements are taken. W_{max} serves as a scaling factor so that fitness measurement can be on any scale instead of just from 0.0 to 1.0, and also so that mean environmental effects of experiments do not influence estimation.
Suppose now that the members of an outcrossed population undergo inbreeding with inbreeding coefficient being f in inbred progeny. In the case of selfing, where f = ^{1}/_{2}, fullsib mating f = ^{1}/_{4}, and halfsib mating f = ^{1}/_{8}. For each heterozygous locus in the outcrossed parental generation, an inbred progeny is expected to be heterozygous and homozygous for the deleterious allele with probabilities (1 – f) and f/2, respectively (Crow and Kimura 1970). Thus, under the assumption of free recombination and by the relationship (n) = U/(hs) (Deng and Lynch 1996), the mean fitness (W_{I}) in the inbred progeny is
To verify our derivation, we substitute f = ^{1}/_{2} [the selfing case as considered by Deng and Lynch (1996, 1997)] into Equations 3, 4, 5. We find that Equations 3 and 4 recover the corresponding Equations 1c1d of Deng and Lynch (1996) and Equation 5 reduces to the corresponding Equation A1(1) of Deng and Lynch (1997). Additionally, it is noted that if we set f = 0 (outcrossing case), Equation 3 recovers Equation 1 and Equations 4 and 5 are reduced to Equation 2; therefore, the above general equations should be correct. The verification will be further carried out in our computer simulations.
Estimation via the information on
Define x, y, and z, respectively, as
From Equations 1, 2, 3 and 5, the expected values of x, y, and z are, respectively,
Note, in Equation 7b, if h = 0.5 (pure additive case), there should be no inbreeding depression (E(y) = 0) and estimation of U cannot be obtained. However, a pure additive case almost does not exist as suggested by the universal phenomena of inbreeding depression and heterosis. Rearranging and letting a circumflex (^) denote an estimate throughout, we obtain potential estimators for the mutational parameters:
By substituting f = ^{1}/_{2}, the estimation equations (8a, 8b, 8c) recover those of Equations A1(4a4c) in Deng and Lynch (1997). Different forms of inbreeding have different f 's. By substituting f for a specific form of inbreeding employed in the experiment into Equations 8a, 8b, 8c, estimation of U, h, and s and other derivative parameters can be obtained. These derivative parameters include (not exclusively) the genetic variance introduced into a population by new mutations per generation V_{m} and the mean number of deleterious mutations per genome n (Deng and Lynch 1996).
COMPUTER SIMULATIONS
To verify our analytical derivations under the assumptions made, and also to test the robustness of the estimation with the violation of some essential assumptions for analytical derivation, statistical properties (sampling variance and bias) of the estimation are investigated by computer simulations. These investigations will provide a basis for accurate inference of the genomic mutations with the estimation developed under the necessary but implausible assumptions. Specifically, the following assumptions will be tested: (1) the fitness function is multiplicative and there are no epistatic fitness effects of mutations; (2) the mutation effects s and h are constant across loci; and (3) there are no lethal mutations. Some other practical issues are also investigated by computer simulations: (1) the relative efficiencies of estimation by employing the information of the total genetic variation
It should be noted that some of the problems were investigated for estimation developed from Equations 1, 2, 3, 4 (Deng and Lynch 1996). However, none of the above problems have been investigated for the estimation developed from Equations 1, 2, 3 and 5, which employ a different experimental design as explained earlier. Although some of the simulation conclusions will be qualitatively similar to those in Deng and Lynch (1996), they will be quantitatively different. These quantitatively different results (especially the different degrees of bias) form the bases for accurate inference of mutations on the basis of different experimental designs and estimations.
1. Estimation under constant mutation effects: We assume that a mutationselection balance has been reached in the parental generation, so that the number of mutations per individual (all in the heterozygous state) is Poisson distributed with an expectation of n = U/(hs). In each situation, simulations are performed for different sets of parameters. For each parameter set, variable K and H individuals are randomly sampled, respectively, from the outcrossed parental and inbred progeny generations. Initially, the genotypic values are assumed to be measured without error and are defined by the multiplicative fitness function used in the derivation. For a genotype with n mutations (randomly determined from the Poisson distribution) from the outcrossed parental generation, the fitness is
2. Estimation under variable mutation effects: Mutation effects h_{i} and s_{i} across loci are unlikely constant. For example, s_{i} may vary anywhere from 0.0 (neutral mutation) to 1.0 (lethal mutation). The rate of occurrence for mutations with different effects may also vary so that mutations of smaller effects may occur at higher rates. To evaluate the direction and the magnitude of bias introduced by variable mutation effects and variable mutation rates, as in Deng and Lynch (1996, 1997), we adopt an exponentially distributed mutation rate for mutations of variable effect s_{i}:
As explained in Deng and Lynch (1996), these are in rough accordance with the few available data (Gregory 1965; Crow and Simmons 1983; Mackayet al. 1992; Keightley 1994) or biochemical arguments (Kacser and Burns 1981). However, true mutational spectra may be such that the dominance of individual mutations is broadly scattered around such a function (Caballero and Keightley 1994).
In simulations, we divide the entire range of s (0.01.0) into 100 discrete classes of width 0.01. Within each class, mutations have constant effects (h_{i} and s_{i}). Each individual from the outcrossed parental generation in the simulation is assigned a number n_{i} of heterozygous mutations from the ith of these classes by drawing from a Poisson distribution with expectation Up_{i}/(h_{i}s_{i}), where p_{i} is the density of the mutational distribution in the ith class. For an individual from the inbred progeny generation, n_{i}s are first determined as above. Then for each of the n_{i} loci, the genotype is, as before, determined by randomly sampling from the trinomial probabilities determined by f, so that probabilities for different genotypes are f/2 for AA, (1 – f) for Aa, and f/2 for aa, respectively.
3. Estimation with lethal mutations present in the genome: Due to their low dominance coefficient, lethal mutations are often sheltered from selection by being kept in heterozygous state in outcrossing populations. To investigate the effects of lethal mutations on estimation, we add an additional low genomic mutation rate (0.01U) to lethals (defined as having s = 1.0 and h = 0.02) (Table 3).
4. Estimation with epistatic mutation effects: The theory we developed here assumes that deleterious mutations across loci interact multiplicatively. Although there is some good evidence that genes for fitness or its components most likely act multiplicatively (Mortonet al. 1956; Crow 1986; Craddocket al. 1995; Fu and Ritland 1996), synergistically epistatic mutation effects can not be ruled out entirely. Thus, we test the robustness of the estimation method under epistatic mutation effects. To evaluate the potential consequences of epistasis on the mutationparameter estimates derived from our model, we consider the epistatic fitness model described by Charlesworth (1990) and employed by us before (Deng and Lynch 1996, 1997):
We implement the epistatic fitness model by assuming mutations of constant effects (s and h). Under mutationselection equilibrium, n is approximately normally distributed with the mean and variance being functions of U, h, s, and β, defined by Equations 3 and 11 in Charlesworth (1990). For the outcrossed parental generation, we again assume that all deleterious mutations exist in the heterozygous state before inbreeding, so that the n that is drawn for a parental individual is the number of heterozygous loci in that individual. n_{1} and n_{2} for an individual from the inbred progeny generation are determined in a similar fashion as before except that n is now randomly determined from a normal distribution instead of a Poisson distribution. The means and variances of fitness for the two generations are then computed, and our estimators, Equations 8a, 8b, 8c which assume no epistasis, are applied to the data.
5. Comparing estimation based on
6. Estimation when genotypic values are measured with error: In the simulations discussed above, genotypic values are assumed to be known without error. In this case, sampling error of estimates comes only from random sampling of outcrossed and inbred genotypes. In reality, this would require that each genotype be clonally replicated and assayed a very large number of times, since polygenic traits are usually expressed with some environmental variance (Falconer and Mackay 1996). In Table 6, we consider the estimation by accounting for additional effects of finite clonal replicates for each genotype on the sampling error. The results for fullsib mating are presented. We examine the situation in which the broadsense heritabilities (H ^{2}) of fitness are 0.20, 0.40, and 0.60, respectively, in the parental generation. The environmental variance (including random measurement error and developmental instability) for fitness is defined as
RESULTS
1. Estimation under constant mutation effects: The parameter estimates for s, h, and U are almost always unbiased with small sampling errors (Table 1). The only exception is when h is very high (h = 0.4), higher than all previously reported h estimates that range from 0.07 to 0.35 (Deng and Lynch 1997). Then Û has large sampling variance. Estimation of s, h, and U under selfing is better than under fullsib mating, which in turn is better than under halfsib mating (as reflected by SD of the estimates). This is because the magnitude of the change of mean and genetic variance upon inbreeding is larger with higher degree of inbreeding. The estimates obtained from the repeated simulations are consistent with normal distributions (KolmogorovSmirnov test, P > 0.50; Sokal and Rohlf 1995), thus SD is employed to reflect the sampling properties of the estimation throughout. The improvement of estimation with an increased degree of inbreeding is small when h is small (h = 0.2). The improvement is very dramatic for U estimation when h (h = 0.4) and U (U = 1.5) are large. Data not shown for simulations where s = 0.050 revealed the same conclusions.
2. Estimation under variable mutation effects: All the estimates are biased (Table 2). The bias is relatively small when s is small and increases with an increasing s. The simulated parameters roughly cover most of the previous experimental estimates. Under the simulated parameters, ŝ ranges from ∼2s to 3s, ĥ ranges from ∼0.35h to ∼0.90h, and Û ranges from ∼0.5 U to 0.8 U. Again, as with constant effects, estimation of s, h, and U under selfing is better than under fullsib mating, which is better than under halfsib mating. The sampling variance decreases with an increasing degree of inbreeding for estimation, while the bias remains roughly constant. Fullsib mating can generally achieve reasonably good estimates in terms of sampling variance.
3. Estimation with lethal mutations present in the genome: The presence of rare lethal mutations (in the simulations shown, an expected number of 0.50 per individual) causes the estimates of s to inflate by a factor of ∼8, and estimates of U and h to decrease by factors of ∼2 and 4, respectively. In practical applications of our proposed technique, this type of problem can perhaps be minimized by eliminating individuals that are homozygous for lethals from the final analyses. This is a protocol similar to that employed in mutationaccumulation experiments (Mukaiet al. 1972). By dropping inviable inbred progeny (homozygous for lethal mutations) from analyses, the estimation can be greatly improved, even better than when there are no lethals (Table 3). This reflects the common practice of sampling conditional on hatch or birth, etc.
4. Estimation with epistatic mutation effects: In general, the biases in estimates of U, h, and s are quite small provided the contribution of epistatic effects to fitness is on the order of <10% (Table 4). With reinforcing epistasis, h and U tend to be underestimated, whereas s tends to be overestimated. When the ratio βn/(2hs) approaches one, so that synergistic epistasis halves the average fitness of individuals relative to that expected in the absence of epistasis, the bias becomes more substantial. Even with strong epistasis, the estimation of h is altered only slightly and the estimates of U are not downwardly biased by more than ∼30%, although the estimates of s can be too high by a factor as large as ∼5. Overall, the results suggest that epistasis must be quite strong for our estimation to generate widely unrealistic estimates.
5. Comparing estimation based on
6. Estimation when genotypic values are measured with error: The higher the H^{2} (Table 6), the more genotypes (K) sampled, or the more replicates (R) cloned for each genotype at assay, the better the estimation, as reflected by the SDs. The bias remains roughly constant with different experiments of different sample sizes. When ^{2} is reasonably high (>0.40), experiments that employ H 100 outcrossed parents and 100 inbred progeny (each from different fullsib matings), with each genotype having at least 10 replicates, can achieve estimation reasonably well. In these experiments, ∼2000 individuals need to be assayed. Generally speaking, for a fixed sample size for assay, increasing K can improve estimation more efficiently than increasing R. Even with relatively low H^{2} (0.20), experiments that employ 150 outcrossed parents and 150 inbred progeny (each from different fullsib matings), with each genotype having at least 10 replicates, can achieve estimation reasonably well. In these experiments, ∼3000 individuals need to be assayed.
As a specific example, assume that one can measure fitness of 2000 individuals. Then an actual experiment could be roughly as follows: Sample 100 random outcrossed genotypes from the outcrossing population under study. For each of them, sample one full sib to produce 100 fullsib pairs. Mate these 100 fullsib pairs to generate 100 inbred progeny genotypes. Clonally replicate the 100 random outcrossed genotypes and the 100 inbred genotypes to generate 10 clones for each of the genotypes. Then analyses can be performed (Deng and Lynch 1996, 1997; Deng and Fu 1998a) to estimate the mean and genetic variation in the parental and inbred offspring generations and the mutation parameters can be estimated by Equations 8a, 8b, 8c. The dependence of the precision on such an experiment is indicated by footnote a in Table 6.
DISCUSSION
In this article, we extended the approach of Deng and Lynch (1996, 1997) to any form of inbreeding to characterize deleterious mutations in outcrossing populations. This extension greatly widens the taxa range of outcrossing populations in which characterizing deleterious mutations is feasible. The statistical properties, robustness, and statistical power of our extension are also investigated under several biologically plausible situations. In addition, we compared the estimation under different degrees of inbreeding and two different experimental designs and data analyses. It is revealed that estimation using the total genetic variation
In this article, we focus on estimation that employs experimentally measurable information such as mean and genetic variance in outcrossed and inbred generations. If external knowledge exists on the variation and covariation of h_{i} and s_{i} of mutation effects, improved (less biased) estimation accounting for the variation and covariation of h_{i} and s_{i} may be obtained. This was done in Deng and Lynch (1996) for selfing experiments. However, the utility of such estimation is limited because our knowledge on variation and covariation of h_{i} and s_{i} essentially does not exist and may be much harder to acquire than to estimate the mean of h_{i} and s_{i}. Recently, we developed a method to approximately estimate the variation of h_{i} in an outcrossed population without having to construct homozygous lines (Deng 1998). However, methods to quantify the covariation of h_{i} and s_{i} do not exist at present, and the statistical performances of quantifying the variation of s_{i} from mutation accumulation data (Keightley 1994) may be poor, as suggested by our preliminary investigations (Deng et al. 1998a,b). Therefore, estimation incorporating external knowledge of variation and covariation of h_{i} and s_{i} is not presented here due to their minimal utility, although it is straightforward to work it out. By employing a more complex design and using more information from the data, estimation that may not be biased by variable mutation effects in both outcrossed and selfed populations can hopefully be developed; furthermore, the covariation between h_{i} and s_{i} may be quantified (H.W. Deng, unpublished results).
One crucial assumption of the estimation developed in this article is that the variation of fitness is maintained by mutationselection (MS) balance. This assumption underlies all the previous estimation approaches applied to natural populations (Mortonet al. 1956; Charlesworthet al. 1990; Deng and Lynch 1996, 1997). Despite tremendous efforts (e.g., Houle 1989, 1994; Houleet al. 1996; Charlesworth and Hughes 1997; Deng et al. 1998), the extent to which this essential assumption is valid is generally unknown. A critical question is how robust the estimations are with different degrees of violation of the MS balance assumption. Extensive studies have recently been performed (J.L. Li, J. Li and H.W. Deng, unpublished results). They have revealed that violation of MS balance may not be as substantial as envisioned. Under variable dominance mutation effects, the estimation bias is actually reduced, with overdominance mutations maintained by balancing selection present in the genome.
Currently, there are several different approaches to characterize different aspects of deleterious genomic mutations (Deng 1998; Deng and Fu 1998a,b). Traditional mutationaccumulation experiments used to be implemented with tremendous labor and time, which may have been far from necessary. If designed properly, mutationaccumulation experiments can be executed much more efficiently with much reduced time and labor (Deng et al. 1998a,b), so that mutationaccumulation may be adopted by many more empiricists. Under realistic situations, all the current estimations are biased. Deng and Lynch's (1996) original procedure (estimation via genetic variation among the means of selfed families) generally is statistically better than other currently available estimation approaches (Deng and Fu 1998a). Our extension (estimation via the total genetic variation in the inbred progeny generation; Deng and Lynch 1997) is shown here to be even more powerful than our original procedure (Deng and Lynch 1996). The extension in Deng and Lynch (1997) is here further extended to any inbreeding experiments in outcrossing populations. Different estimation approaches have different peculiar assumptions that may be difficult to validate in particular experimental settings (Deng and Fu 1998a). In addition, besides the statistical properties, different approaches have different advantages and drawbacks in practice and are best applicable to different organisms and in different situations. The estimates obtained by different approaches in different organisms can be crosschecked and, hopefully, will eventually resolve the issues concerning the genomic mutations.
Acknowledgments
I thank Professor M. Lynch for helpful comments on the manuscript and his years of advice, encouragement, and continuous support. I also thank Professor M. Slatkin, Professor A. Kondrashov, and an anonymous reviewer for helpful comments that helped to improve this article. Graduate students J. Li and J.L. Li helped in running some simulations for this article. This work was partially supported by a grant from National Institutes of Health R01 AR45349 and a Health Future Foundation grant of Creighton University, Nebraska.
Footnotes

Communicating editor: M. Slatkin
 Received April 10, 1998.
 Accepted July 13, 1998.
 Copyright © 1998 by the Genetics Society of America