Abstract
The genomes of all organisms are subject to continuous bombardment of deleterious genomic mutations (DGM). Our ability to accurately estimate various parameters of DGM has profound significance in population and evolutionary genetics. The Deng-Lynch method can estimate the parameters of DGM in natural selfing and outcrossing populations. This method assumes constant fitness effects of DGM and hence is biased under variable fitness effects of DGM. Here, we develop a statistical method to estimate DGM parameters by considering variable mutation effects across loci. Under variable mutation effects, the mean fitness and genetic variance for fitness of parental and progeny generations across selfing/outcrossing in outcrossing/selfing populations and the covariance between mean fitness of parents and that of their progeny are functions of DGM parameters: the genomic mutation rate U, average homozygous effect s, average dominance coefficient h, and covariance of selection and dominance coefficients cov(h, s). The DGM parameters can be estimated by the algorithms we developed herein, which may yield improved estimation of DGM parameters over the Deng-Lynch method as demonstrated by our simulation studies. Importantly, this method is the first one to characterize cov(h, s) for DGM.
THE genomes of all organisms are subject to deleterious genomic mutations (DGM) continuously. In spite of our increasing knowledge of the molecular underpinnings of mutations, little is known about the overall risk exerted on human health and on continuing survivability of other organisms (especially rare and endangered species) by DGM (Crow 1993a,b, 1995). To assess this overall risk correctly, we need to have a solid knowledge of the genomic mutation rate (U) at which DGM arise in the whole genome of an individual and the distribution of their effects, such as the mean selection coefficient (s), the mean dominance coefficient (h), and the covariance of dominance and selection coefficients of DGM [cov(h, s)]. Estimation of these parameters is also important for testing the validity of a number of evolutionary theories in genetics (Turelli and Orr 1995; and the references within Deng et al. 1998, 1999).
Despite the extreme importance of our knowledge of deleterious mutation parameters, few estimates are available (Simmons and Crow 1977; Crow and Simmons 1983; Kondrashov 1988; Crow 1993a,b, 1995; Bataillon 2000). Particularly, no method to estimate U is not biased by variable mutation effects, and no method to estimate cov(h, s) is important for our understanding of Haldane’s rule by the dominance hypothesis (Turelli and Orr 1995). The current experimental approaches and the estimation methods of the parameters of DGM are summarized and compared (Deng and Fu 1998; Denget al. 1999; Deng and Li 2001). It is concluded that under their respective assumptions of various approaches, estimation by the Deng-Lynch method (Deng and Lynch 1996, 1997) in natural populations generally results in the best statistical quality in terms of bias and sampling variance (Deng and Fu 1998). In addition, it has been shown that violation of various assumptions [including the mutation-selection (M-S) balance assumption] underlying the Deng-Lynch method does not seriously undermine its estimation robustness (Liet al. 1999; Li and Deng 2000; Deng and Li 2001).
As with almost all the other estimation methods (except a maximum-likelihood estimation method for mutation-accumulation experiments; Keightley 1994), the Deng-Lynch method that applies to natural outcrossing or selfing populations assumes constant fitness effects of DGM. This assumption is well recognized as biologically implausible. Although the estimation bias introduced by variable mutation effects in the Deng-Lynch estimation method by assuming constant mutation effects is not substantial (Denget al. 1999), an estimation method that considers variable mutation effects may reduce estimation bias (although not necessarily always so). Most importantly, the parameters [e.g., cov(h, s)] characterizing variable effects of DGM can be estimated only in statistical methods that consider variable mutation effects.
In this article, we present a method for estimating DGM parameters accounting for variable effects across loci in natural outcrossing or selfing populations at M-S balance. We investigate the statistical properties (bias and sampling variance) of this new method, using computer simulations in comparison with the Deng-Lynch method (Deng and Lynch 1996, 1997) that assumed constant mutation effects across loci.
THEORY
The assumptions are the same as those of the Morton-Charlesworth method (Mortonet al. 1956; Charlesworthet al. 1990) and the Deng-Lynch method (Deng and Lynch 1996, 1997; Deng 1998b). Namely, the population is assumed to be large, randomly mating, highly selfing or outcrossing, at linkage equilibrium, and at M-S balance. In addition, the fitness function is assumed to be multiplicative, which is biologically plausible (Mortonet al. 1956; Crow 1986; Craddocket al. 1995; Fu and Ritland 1996). Mutations at each locus are assumed to have constant effect s and h.
In this study, we consider variable mutation effects in the development of an estimation method for DGM parameters in natural populations. Under variable mutation effects across loci, homozygous effect s for mutations is a random variable between 0 and 1. We assume that, for a mutation, dominance coefficients h and s are functionally related so that h = h(s). This assumption is supported by the limited data and theory (Simmons and Crow 1977; Kacser and Burns 1981; Crow and Simmons 1983). We divide the domain of s, [0, 1], for new mutations into T intervals with each having a width of 1/T. Let Ik = [k/T, (k + 1)/T] denote the kth interval, and define the probability
With the assumptions we have, in outcrossing populations, the number of mutant alleles with mutation effects s falling into an interval Ik within an individual (all in the heterozygous state; Mortonet al. 1956; Deng and Lynch 1996) follows a Poisson distribution with an expectation
Outcrossing populations: We illustrate our experimental design and estimation method by using populations capable of selfing. The method may be extended to outcrossing populations where selfing is not feasible as in the Deng-Lynch method (Deng 1998b). The basic data structure is outcrossed parents and multiple selfed progeny from each parent (forming selfed families). Let Wo and Ws be the mean fitness in the parental and offspring generations, respectively, σ2o the genetic variance of fitness in the parental generation, σ2t the total genetic variance of fitness in the selfed progeny generation, σ2s the genetic variance of the mean fitness of selfed progeny in selfing families, and cov(wp, ws) the covariance between the fitness of a parent (wp) and the mean fitness of its selfed progeny (ws). Under the above assumption for mutation effects that are variable across various intervals at different loci, as in Deng and Lynch (1996), it can be shown that the fitness moments are related to the DGM parameters as
Among Equations 2, 3, 4, 5, 6, 7, there are only five independent equations containing six unknown parameters. By assuming one of the six parameters known in the estimation, estimators of the other parameters can be derived. This is the strategy employed in the likelihood characterization of DGM parameters when variable mutation effects are considered in estimation (Keightley 1994; Denget al. 1999; Deng and Li 2001). Here we assume that U is known in the estimation for the time being. Alternatively, an initial value of U may be estimated from other approaches (Denget al. 1999) or may be estimated by the current experimental design and data with the Deng-Lynch method (Deng and Lynch 1996; see below). (If we assume that one of the parametersh˜, s, and hs is known, similar estimation procedures can be derived for U and the rest of the other parameters.h˜ can be estimated by methods such as that of Deng 1998a.) Solving these equations jointly yields estimators of h˜,
From these estimates, other composite parameters of DGM, such as the mean number of mutations per genome n, mutational variance Vm per generation, and mean mutation effects on fitness
Selfing populations: Random pairs of highly selfing and homozygous parental genotypes (denoted as P generation) are crossed to obtain outcrossed progeny (denoted as F1 generation). Let W¯p and
It should be noted that the derivation for Equations 2, 3, 4, 5, 6, 7 and 11, 12, 13, 14, 15 assumes mutation effects that are variable. The strategy is to divide the range of variable selection coefficient s (from zero to one) into infinitely small intervals so that s can be treated as constant within each of the intervals but varying across intervals in our analytical derivation. Again, there are six unknowns (U, h,
In selfing populations, we can use Equation 10 to estimate cov(h, s) by the above estimates of h,
The above estimation developed herein does not assume any specific functional relationship between s and h and any specific distribution form for the selection coefficient s. Therefore, the estimates are robust to different unknown forms of the distribution of s and the functional relationship between s and h. This is true despite that we assume specific distributions of s and a functional relationship between s and h in the following simulation studies to investigate the statistical properties of our estimation.
SIMULATIONS AND RESULTS
As with Keightley (1994), we assume that s for mutations follows a gamma distribution, with a density function
The simulation procedures are the same as those that have been documented extensively earlier (Deng and Lynch 1996; Deng 1998b) and are thus not elaborated here. In simulations, we assume that the fitnesses of various genotypes can be measured with little error, which is justifiable in the investigation of estimation bias and comparison of various estimation methods (Denget al. 1999). Under the assumptions for the analytical development of our estimation methods, the number of mutant alleles corresponding to an interval Ik per individual follows the Poisson distributions (Equations 1a and 1b) with pk being determined as
To evaluate the performance of our estimation in outcrossing populations in simulations, for each set of parameters U, α, and β, K parents were sampled from the parental generation, and from each of these, M selfed progeny were produced. The fitness of an individual from the parental generation is
For selfing populations, the fitness of an individual from the parental generation is
In the estimation Equations 8 or 16, U must be known, assumed, or estimated with other approaches first. In simulations, we experimented and examined two methods to estimate U: (1) by the Deng-Lynch method (Deng and Lynch 1996) and (2) by an empirical regression procedure introduced here. We simulated parents and their children according to variable effects for each set of given parameter values of U, α, and β, and obtained the estimatesÛ1,ŝ1, andĥ1 by the Deng-Lynch method (Deng and Lynch 1996). (A circumflex indicates an estimated value throughout.) We found a strong linear relationship between the parameter values of U and the estimatesÛ1 andŝ1 under any fixed β. Through a series of simulations, we obtained samples under various parameter values of U, α, and fixed β-values, and we obtained estimatesÛ1 andŝ1 with the Deng-Lynch method under various fixed β-values. Then we fit a multiple regression model under each specific β-value,
The simulation results are represented by the data in Tables 1, 2, 3, 4. The ranges of the values for the parameters (such as U, h, and s) generally cover those reported earlier from classical empirical experiments (e.g., Mukaiet al. 1972; Lynchet al. 1999). Three general conclusions emerge from our simulation studies under variable mutation effects. First, when U is set to equal true values or when the estimates of U are obtained via Equation 19 by assuming a correct β-value, application of Equation 8 or 16 to both obligate selfing or outcrossing populations yields nearly unbiased estimates for the DGM parameters with small standard deviation. The estimates of U by Equation 8 have smaller mean square error despite larger standard deviation when U is set equal to the estimates obtained by regression Equation 19 than those obtained by the Deng-Lynch method. The larger standard deviation may be partly due to the fact that Equation 19 is established by empirical regression procedures that involve an additional level of sampling error for the final estimation. The estimates of s by Equation 8 in outcrossing populations have smaller sampling variance and smaller bias than those obtained directly by the Deng-Lynch method, e.g., by comparison of the estimates in rows 1 and 3 for each parameter set in Table 1. This is true even when no prior assumption is made about the magnitude of U, when U is first estimated directly with the Deng-Lynch method, and then the estimate of U is used in the current estimation method, (Equation 8) for the other DGM parameters. The estimates ofh˜ by Equation 8 have smaller or comparable sampling variance than those obtained directly by the Deng-Lynch method for h (for each parameter set, compare the estimates of the second to fourth rows with that of the first row in Table 1). The comparison of the estimation quality between the current estimation method and the Deng-Lynch method changes little with the parameter values (Table 1). When β= 0.5, the bias of the estimates of the parameters is larger than that when β= 1 and 2. This may be due to the approximation formula 18 used to compute pk = P(si ∊ Ik) when β = 0.5, while the computation of pk = P(si ∊ Ik) when β= 1 and 2 is exact.
Second, when U is set equal to the estimates (Û1) that were obtained by the Deng-Lynch method (Deng and Lynch 1996) and that are downwardly biased, the estimates of the other DGM parameters by Equations 8 and 16 are biased with small sampling variance (Tables 1 and 2). For outcrossing populations, the estimation Equation 8 yields less biased estimates with smaller standard deviation for s than for the Deng-Lynch method (Table 1), and the estimates of s,
Third, in outcrossing populations, the cov(h, s) is correctly estimated to be an upper bound of cov(h, s); however, the sign of cov(h, s) can sometimes be estimated to be different from that of cov(h, s). In selfing populations, cov(h, s) can always be estimated with correct sign and small estimation bias.
ROBUSTNESS ANALYSIS
In the estimation of the DGM parameters, we need a prior estimate of one of the six parameters (such as U as investigated here) based on some external knowledge obtained from other estimation approaches. The estimation bias of this parameter or the bias of an assumed value will cause estimation bias of the other parameters. Hence, we investigate the sensitivity of estimators to the departures of U from true value, using computer simulations (Figures 1 and 2). We define a relative bias rate (RBR), (estimate true value)/(true value), to measure the sensitivity of estimators to an incorrectly assumed or estimated U value. In examining the robustness of the estimator for cov(h, s), the true value used is the parameter value of cov(h, s) as defined after Equation 10 and not cov(h, s).
In simulations for the investigation of the robustness of our current estimation of the other DGM parameters, U is set equal to a given value (denoted as Ugiven), which ranges from 0.5U0 to 1.5U0 (U0 is the true value of U). This range of the estimate of U investigated is reasonable given the magnitude of bias that is normally found with the method such as that of Deng and Lynch (1996). The changes in the mean relative bias rates (MRBR) of the estimates of the parameter values in 1000 simulations are shown in Figures 1 and 2. It can be seen that when Ugiven ranged from 0.7U0 to 1.5U0 (which means that the departure of Ugiven from U0 ranged from -0.3U0 to 0.5U0), the MRBR of the estimates of the parameter values changed smoothly and changed little in both outcrossing and selfing populations. When Ugiven ranged from 0.9U0 to 1.2U0, the absolute values of the MRBR of the estimates of parameters [except cov(h, s) for outcrossing populations when α= 20] are <0.185 in both outcrossing and selfing populations. For outcrossing populations, when α= 20, if Ugiven ≤ 0.9U0 or Ugiven ≥ 1.1U0, the absolute values of the MRBR of cov(h, s) are >1.0 (Figure 1, b and d). (Note the scale difference of the y-axis in Figure 1, b and d, with the other plots in Figures 1 and 2.) Thus, even when U is estimated with some bias, if the magnitude is similar to that obtained by methods such as that of Deng and Lynch (1996), our current estimation method can generally still yield relatively robust estimates of DGM parameters (except cov(h, s) for outcrossing populations when α is as small as 20). In outcrossing populations, the MRBR changed the sign in the robustness investigation of cov(h, s) when s = 0.01 and 0.05, respectively. This is because the parameter value cov(h, s) changed the sign from negative to zero and then to positive values under the functions assumed when s changes from 0.047 to 0.048.
Parameter estimates under variable mutational effects in outcrossing populations (β = 1.0)
Parameter estimates under variable mutational effects in selfing populations (β = 1.0)
Parameter estimates under variable mutational effects in outcrossing populations
Parameter estimates with variable mutational effects in selfing populations
—The changes in RBR of the estimates of s,h˜, , cov(h, s) obtained by Equation 8 in outcrossing populations when U were given equal to the values that ranged from 0.5U0 to 1.5U0. Each data point was the mean in 1000 simulations with the following sets of parameters and β= 1.0: (a) U0 = 1.5, s = 0.01, and α= 100; (b) U0 = 1.5, s = 0.05, and α= 20; (c) U0 = 0.5, s = 0.01, and α= 100; and (d) U0 = 0.5, s = 0.05, and α= 20.
DISCUSSION
We have developed a method in this study for considering variable mutation effects across loci in the estimation. The method may yield improved estimation over that of Deng and Lynch (1996) as shown by employing additional and independent information (such as the covariance between mean fitness of parents and that of their progeny) to that employed in Deng and Lynch (1996), although the experimental design is the same. Importantly, cov(h, s) for DGM can be estimated (Equation 10) from an experiment for the first time. Previously, a negative correlation between h and s has long been conjectured from theory only (Kacser and Burns 1981) and from limited data (Simmons and Crow 1977; Crow and Simmons 1983). There has been no formal statistical analysis and experimental design to characterize cov(h, s).
Characterization of cov(h, s) is important, for example, for testing the validity of the dominance hypothesis (Turelli and Orr 1995) in explanation of Haldane’s rule. Haldane’s rule states that when one sex is inviable or sterile in the hybrids of two different animal races, that sex is often the heterogametic sex. The dominance hypothesis (Turelli and Orr 1995) states that alleles decreasing hybrid fitness are partially recessive. For the dominance hypothesis to explain Haldane’s rule, it is necessary that cov(h, s) is <0. Hence, our estimation method here may offer the first opportunity to test the validity of the dominance hypothesis in explaining Haldane’s rule by characterizing the sign of cov(h, s). Although it would be nice and significant to have estimators for the other DGM parameters as well, such as variance of s, the observable phenotypic moments of fitness do not relate to other DGM parameters (including the variance of s) in our analytical derivation that considers mutation effects in Equations 2, 3, 4, 5, 6, 7 and 11, 12, 13, 14, 15.
—The changes in MRBR of the estimates of s, h, , cov(h, s) obtained by Equation 16 in selfing populations when U were given equal to the values that ranged from 0.5U0 to 1.5U0. Each data point was the mean in 1000 simulations with the following sets of parameters and β= 1.0: (a) U0 = 1.5, s = 0.01, and α= 100; (b) U0 = 1.5, s = 0.05, and α= 20; (c) U0 = 0.5, s = 0.01, and α= 100; and (d) U0 = 0.5, s = 0.05, and α= 20.
In the estimation of the DGM parameters, we need a prior estimate of one of the six parameters based on some external knowledge or based on the estimates obtained from alternative approaches or from the same experimental design by using the Deng-Lynch method as demonstrated here. We provided the estimators of the other DGM parameters by using Equations 8 and 16 when assuming that U is known or estimated via other approaches. If we assume that one of the parameters s,h˜ (h), or
It can be seen from Equations 1a and 1b that the mean of h for the Charlesworth technique (Charlesworthet al. 1990) in estimating U in selfing populations should be the arithmetic mean h, and the mean for the Morton technique (Mortonet al. 1956) in outcrossing populations should be the harmonic meanh˜. This has seldom, if ever, been pointed out because the Morton-Charlesworth technique was derived under constant mutation effects. To our knowledge, there has been no method for estimating eitherh˜ or h. Our proposed estimation methods here are able to, again for the first time, allow estimates ofh˜ and h with relatively small bias under variable mutation effects.
The majority of earlier estimation methods for DGM assume constant mutation effects. The only exception is the maximum-likelihood estimation developed for analyses of mutation-accumulation experiments (Keightley 1994, 1996). Like our current estimation method, Keightley’s maximum-likelihood estimation also needs to assume a parameter value of DGM to estimate the other DGM parameters in his model. Our results (Deng and Li 2001) suggest that a method that accounts for variable mutation effects does not necessarily always yield better estimation than a method that assumes constant mutation effects even under variable mutation effects. In our current estimation, the covariance between mean fitness of parents and that of their progeny is independent of the other measurable experimental data (such as the means and genetic variance of fitness of the two generations across inbreeding/outcrossing) that are used in the Deng-Lynch estimation (Deng and Lynch 1996). This additional and independent information contributes to the improved estimation of our current method in quality and to our ability to estimate additional DGM parameters that could be estimated earlier.
For our methods that are applicable to natural out-crossing populations and selfing-fertilizing populations, M-S balance is assumed to be the mechanism maintaining variation for fitness. Alternatives to M-S balance, such as functional overdominance or overdominance induced by fluctuating selection, may, in principle, maintain polymorphisms. Most evidence suggests dominance as heterozygous mutation effects and thus is compatible with M-S balance (Houle 1989, 1994; Houleet al. 1996; Denget al. 1998). However, mechanisms responsible for the maintenance of genetic variance are complex and may differ among populations. If any other mechanism, such as balancing selection or migration, leads to the maintenance of genetic variation (Drakeet al. 1998; Keightley 1998), our methods may result in biased estimation. Using approaches (Liet al. 1999; Li and Deng 2000; H.-W. Deng and J. Li, unpublished results) that we have used to investigate the robustness of the Deng-Lynch method in the presence of violation of the M-S balance assumption, we can and we will pursue in our future studies investigation of how robust the current method is with different degrees of violation of M-S balance assumption.
APPENDIX: ESTIMATION OF OTHER DGM PARAMETERS WHEN h IS ASSUMED OR ESTIMATED AND SOME REPRESENTATIVE SIMULATION RESULTS
Ifh˜ (in outcrossing populations) or h (in selfing populations) is known by other estimation methods or assumed at particular values on the basis of some external knowledge, based on Equations 2, 3, 4, 5, 6, 7 and 11, 12, 13, 14, 15, we have estimators for other DGM parameters as follows, the notations being the same as in the text, in outcrossing populations,
Simulations are performed similar to that described in the text and with the above estimation for other DGM parameters whenh˜ (in outcrossing populations) or h (in selfing populations) is known or estimated. The simulation and the experimental procedures, whenh˜ (in outcrossing populations) and h (in selfing populations) are estimated by the methods of Deng (1998a) or Mukai et al. (1972), are detailed in Deng et al. (1998) and thus are not elaborated here.
Some representative results are presented in Tables A1 and A2. It can be seen that, relative to the Deng-Lynch method, the new method developed here can estimate more parameters, such as cov(h, s) and its sign. In an outcrossing population, the sign of cov(h, s) cannot be reliably estimated. However, in selfing populations, if the h is estimated first by the Deng-Lynch method and then used in the current method, the sign of cov(h, s) can be characterized correctly.
Parameter estimates under variable mutational effects in outcrossing populations when h̃ is known or estimated (β = 1.0)
Parameter estimates under variable mutational effects in outcrossing populations when h¯ is known or estimated (β = 1.0)
Acknowledgments
We are grateful to anonymous reviewers for their constructive comments that greatly helped to improve the manuscript. This work is partially supported by grant R01 GM60402-01A1 from National Institutes of Health (NIH). H.W. Deng was also partially supported by grants from Health Future Foundation, NIH K01 grant AR02170-01, NIH R01 grant AR45349-01, grants from State of Nebraska Cancer and Smoking Related Disease Research Program (LB598), Nebraska Tobacco Settlement Fund (LB692), NIH grant P01 DC01813-07, U.S. Department of Energy grant DE-FG03-00ER63000/A00, grants (30025025 and 30170504) from National Science Foundation of China, and grants from HuNan Normal University and the Ministry of Education of China.
Footnotes
-
Communicating editor: Z-B. Zeng
- Received April 26, 2002.
- Accepted August 26, 2002.
- Copyright © 2002 by the Genetics Society of America