- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Fernández, B.
- Articles by Caballero, A.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Fernández, B.
- Articles by Caballero, A.
Genetics, Vol. 168, 1053-1069, October 2004, Copyright © 2004
doi:10.1534/genetics.104.027706
Analysis of the Estimators of the Average Coefficient of Dominance of Deleterious Mutations
B. Fernández*,
A. García-Dorado
and
A. Caballero*,1
* Departamento de Bioquímica, Genética e Inmunología, Facultad de Ciencias, Universidad de Vigo, 36200 Vigo, Spain
Departamento de Genética, Facultad de Ciencias Biológicas, Universidad Complutense, 28040 Madrid, Spain
1 Corresponding author: Departamento de Bioquímica, Genética e Inmunología, Facultad de Ciencias, Universidad de Vigo, 36200 Vigo, Spain.
E-mail: armando{at}uvigo.es
>ABSTRACT
PREDICTIONS FOR AN INFINITE...
APPLICATIONS TO A FINITE...
MUTATIONAL MODELS AND ESTIMATION...
RESULTS
DISCUSSION
ACKNOWLEDGEMENTS
LITERATURE CITED
We investigate the sources of bias that affect the most commonly used methods of estimation of the average degree of dominance (h) of deleterious mutations, focusing on estimates from segregating populations. The main emphasis is on the effect of the finite size of the populations, but other sources of bias are also considered. Using diffusion approximations to the distribution of gene frequencies in finite populations as well as stochastic simulations, we assess the behavior of the estimators obtained from populations at mutation-selection-drift balance under different mutational scenarios and compare averages of h for newly arisen and segregating mutations. Because of genetic drift, the inferences concerning newly arisen mutations based on the mutation-selection balance theory can have substantial upward bias depending upon the distribution of h. In addition, estimates usually refer to h weighted by the homozygous deleterious effect in different ways, so that inferences are complicated when these two variables are negatively correlated. Due to both sources of bias, the widely used regression of heterozygous on homozygous means underestimates the arithmetic mean of h for segregating mutations, in contrast to their repeatedly assumed equality in the literature. We conclude that none of the estimators from segregating populations provides, under general conditions, a useful tool to ascertain the properties of the degree of dominance, either for segregating or for newly arisen deleterious mutations. Direct estimates of the average h from mutation-accumulation experiments are shown to suffer some bias caused by purging selection but, because they do not require assumptions on the causes maintaining segregating variation, they appear to give a more reliable average dominance for newly arisen mutations.
THE dominance of genes controlling fitness components is a key issue for theoretical predictions in population and quantitative genetics, as parameters such as inbreeding depression and genetic variance depend heavily on the degree of dominance (CROW and KIMURA 1970; LYNCH and WALSH 1998; CHARLESWORTH and HUGHES 1999). However, the dominance of genes is difficult to estimate in terms of both statistical analysis and effort required. From mutation-accumulation experiments it is possible to estimate some properties of the dominance of new spontaneous mutations in the absence of substantial selection, and this is the most direct measure that can be obtained (see GARCíA-DORADO et al. 2004 for a recent review and PETERS et al. 2003, FRY and NUZHDIN 2003, SZAFRANIEC et al. 2003, and references therein for estimates from mutations induced by EMS and by transposable-element insertions). Indirect estimates of dominance can also be inferred from the genetic structure of segregating populations assumed to be at the balance between mutation and directional selection. GARCíA-DORADO et al. (1999) and GARCíA-DORADO and CABALLERO (2000) provide a discussion of the difficulties encountered in the interpretation of the estimates of dominance obtained from mutation-accumulation experiments. In this article, we extend the discussion to the most extensively used methods based on segregating populations.
A widely used approach is based on the analysis of chromosomes extracted from natural populations. Estimates are available for several fitness components in Drosophila (mostly viability, reviewed by SIMMONS and CROW 1977 and GARCíA-DORADO et al. 1999) and rely on different measurements of the relationship between the expression of some fitness trait in homozygous individuals and their panmictic crosses. Another widely used method is based on the comparison between the genetic load of outbred and inbred populations for the fitness trait, and estimates are available for a range of species and traits (see LYNCH and WALSH 1998, pp. 284287). These two kinds of methods are the main subject of this article.
More sophisticated versions of the second method have been developed that also incorporate additional population parameters, such as variances and covariances of a fitness component for outbred and selfed populations (DENG and LYNCH 1996; DENG et al. 2002). In addition, some methods require previous knowledge of the genomic mutation rate (LYNCH et al. 1995) or of one of several mutational parameters (DENG et al. 2002). Finally, a method is based on the ratio of estimates of additive and dominance variance components (COMSTOCK and ROBINSON 1952). None of these methods are considered in this article, as they have been used more rarely, may involve the previous inference of other mutational parameters, and may require the estimation of variance components, generally subject to larger estimation errors than those for means.
A key assumption for the inferences obtained from segregating populations to be reliable is that no substantial proportion of the standing genetic variability for fitness, or for the fitness-related trait that is being studied, should be maintained by mechanisms other than mutational pressure (e.g., overdominance, soft selection, hitchhiking, etc.). Although this is surely not the case for many wild populations, the theory provides useful predictions for the mutation-selection balance "null hypothesis." Furthermore, mutation-selection balance predictions and estimates rely on additional assumptions that, although explicitly stated when the underlying theory was developed, are often ignored at the time the estimation technique is implemented or applied.
In this article, we investigate the sources of bias that affect the estimation of the average degree of dominance of deleterious mutations from segregating populations at mutation-selection-drift balance, under a range of mutational and dominance models. The main emphasis is on the finite size of the populations, a factor that has not been so far considered. We compare the coefficient of dominance of segregating mutations and newly arisen ones, along with their corresponding estimates, to clarify their relationship. Finally, we assess the bias incurred by selection on the estimates obtained from mutation-accumulation experiments, so as to compare the biases of direct and indirect methods of estimation.
ABSTRACT
>PREDICTIONS FOR AN INFINITE...
APPLICATIONS TO A FINITE...
MUTATIONAL MODELS AND ESTIMATION...
RESULTS
DISCUSSION
ACKNOWLEDGEMENTS
LITERATURE CITED
![]() | (1) |
(MORTON et al. 1956; LI and NEI 1972; CROW 1979, 1993), which we call the "pervasiveness" of the mutant, to avoid confusion between this and the true persistence time of a mutant (see GARCíA-DORADO et al. 2003). Thus, the arithmetic mean of the coefficient of dominance for segregating (equilibrium) deleterious mutations
is
![]() | (2) |
Here and henceforth, the sum is over all mutations, the subscript E denotes segregating [equilibrium (E)] mutations, and the overbar indicates the arithmetic mean. Substituting the expected pervasiveness in an infinite population (Equation 1) into (2) we obtain the mutation-selection balance (MSB) prediction of Equation 2,
![]() | (3) |
is the unweighted harmonic mean of newly arisen ones
. This was initially stated by MORTON et al. (1956) and HIRAIZUMI and CROW (1960) for homozygous lethal mutations. For the case of variable s, this is true only for the unlikely case in which s and h are statistically independent among newly arisen mutations (see below). However, it was later misleadingly employed as a general result (MUKAI 1969; MUKAI et al. 1972; MUKAI and YAMAGUCHI 1974; WATANABE et al. 1976).
Several methods have been developed to estimate the properties of the coefficient of dominance for newly arisen or segregating deleterious mutations from data on large outbred populations. T. Mukai and co-workers obtained many estimates for viability in Drosophila melanogaster by extracting chromosomes from natural populations and building homozygous and heterozygous (i.e., panmictic nonhomozygous) lines. Using the regression of heterozygous viabilities (y) on the sum of the genetic values for the two homozygous ones (x), or vice versa, estimates of the average h can be obtained. Basically, using the previous model of mutation-selection balance in an infinite population, and again disregarding homozygous expression in panmictic crosses (i.e., assuming that deleterious mutations segregate at low frequencies), the variances and covariances of homozygous and heterozygous viabilities are
![]() |
![]() | (4) |
The regression of y on x can also be expressed as the average of h values weighted by the genetic variance if all individuals were homozygotes (see MUKAI et al. 1972; CABALLERO et al. 1997). Substituting Equation 1 into (4) gives an MSB prediction as the harmonic mean of h for newly arisen mutations weighted by s,
![]() | (5) |
A method proposed by DENG (1998), based on the relationship between outbred parents and their selfed offspring, is an alternative for outcrossers capable of selfing if homozygous lines cannot be constructed, and the estimated average h is identical to Equation 4. Thus, as expected, both by.x and Deng's estimates give nearly identical values in cases where lethal mutations are excluded (DENG 1998). The regression of heterozygous on homozygous viabilities (by.x) has been interpreted as the arithmetic mean coefficient of dominance in an equilibrium random-mating population
even if there are differential selection coefficients among loci (e.g., MUKAI and YAMAGUCHI 1974). However, this is the case only when s and h are not correlated, as by.x gives an average h weighted by s2 (Equation 4).
Similarly, the inverse of the regression of x on y (bx.y) estimates
![]() | (6) |
![]() | (7) |
Finally, a simple method of estimation of the dominance coefficient involves estimates of the mean of a fitness component for outbred and inbred populations (LYNCH and WALSH 1998, pp. 283287). Thus, again assuming that deleterious mutations segregate at low frequencies, the mean of an infinite outbred population is
, and that of a completely inbred one is
, assuming there has been no purging of deleterious mutations over the inbreeding period, where Wmax is the trait value of a genotype free of segregating deleterious mutations. Therefore,
![]() | (8) |
![]() | (9a) |
This method, applied to the viabilities of individual homozygous chromosomes and their crosses, was originally used by MUKAI and YAMAZAKI (1968) to estimate the average h for initially isogenic chromosomes that were allowed to accumulate mutations for a period of time. In a segregating population Wmax is unknown, but for absolute viability an upper bound of Wmax = 1 can be used to approximate the harmonic mean through what we denote the ratio estimate,
![]() | (9b) |
However, this estimate is biased toward
N = 0.5 whenever Wmax < 1, a highly likely situation implying that there are nongenetic causes of mortality. Thus, for
N < 0.5, this ratio gives an upper bound for
N (LYNCH and WALSH 1998), which is, in turn, a lower bound for the arithmetic mean
N, as the harmonic mean would be generally lower than the arithmetic mean. This renders the ratio estimate difficult to interpret.
To facilitate further discussions, a summary of Equations 29 is shown in Table 1. Each estimation method provides an estimate of the average of dominance coefficients for segregating populations (subscript E), with different weightings. In addition, each method provides an inference of the average (arithmetic or harmonic and, again, with different weightings) of dominance coefficients for newly arisen mutations (subscript N), under the assumption of mutation-selection balance (MSB predictions).
|
ABSTRACT
PREDICTIONS FOR AN INFINITE...
>APPLICATIONS TO A FINITE...
MUTATIONAL MODELS AND ESTIMATION...
RESULTS
DISCUSSION
ACKNOWLEDGEMENTS
LITERATURE CITED
. This assumption is appropriate only for sh >> 1/2Ne, where Ne is the effective population size (LI and NEI 1972). For smaller sh values, drift and/or selection against homozygotes become relevant factors limiting the actual pervasiveness. Although the pervasiveness predicted at the MSB goes to infinity as sh decreases (Equation 1), its actual value is always below 2Ne due to drift, even for cases where selection against homozygotes is irrelevant (GARCíA-DORADO et al. 2003). Thus, for finite populations of reasonable Ne, estimates of the average coefficient of dominance are potentially biased depending on the distribution of s and h values.
The first source of bias is that the distribution for the homozygous deleterious effect, f(s), may have a large probability density for deleterious effects close to zero (effectively neutral). This occurs, for example, for a gamma distribution of mutational effects with shape parameter ß
1 (i.e., with coefficient of variation CV
1), so that f(s) goes to infinity for s = 0. In this case, the number of effectively neutral mutations predicted at the MSB is much larger than its real value in finite populations. In the context of the estimate of a wide set of mutational parameters, the problem has been circumvented by arbitrarily discretizing the continuous distribution assumed for s (DENG and LYNCH 1996; DENG 1998; DENG et al. 2002).
The second source of bias is that, even for deleterious mutations that are not effectively neutral (s > 1/2N), if h is small enough that selection against homozygotes becomes relevant, the MSB prediction of the mutational pervasiveness is again biased upward. Therefore, when such mutations occur with considerable probability, the MSB prediction for the average h for segregating deleterious mutations is biased downward. To take an extreme example, MSB predicts infinite pervasiveness for completely recessive deleterious mutations whatever the s value. Thus, if the distribution for h gives positive density for completely recessive mutations [g(h = 0) > 0], the harmonic mean of h will be zero for newly arisen deleterious mutations, so that MSB will also predict average dominance of zero for segregating ones (Equation 3). Again, this has been circumvented by arbitrarily discretizing the continuous distribution assumed for h, so that very small h values are disregarded (MUKAI 1969). Because those small h values may actually occur for mutations with substantial homozygous deleterious effects, this procedure induces a bias that may result in substantially flawed predictions, in particular regarding the inbreeding depression.
Finally, it is likely that there is a negative correlation between s and h. Mutants affecting viability in Drosophila clearly show a negative correlation between s and h (GREENBERG and CROW 1960; SIMMONS and CROW 1977; CABALLERO and KEIGHTLEY 1994), and loss-of-function mutations at loci coding for enzymes acting in metabolic pathways are often more recessive than mutations with small effects (WRIGHT 1934; KACSER and BURNS 1981; KEIGHTLEY 1996). This further complicates the interpretation of the estimates, as they become dependent on the relationship between dominance coefficients and homozygous effects according to their different weighting factors. Under a negative correlation between s and h, the MSB prediction for by.x and 1/bx.y will decrease, as they correspond to new mutations' means weighted by s (see Table 1). In contrast, the MSB prediction of
E will increase, as it corresponds to the harmonic mean of new mutations weighted by 1/s. However, the actual estimates of by.x, 1/bx.y, and
E in finite populations can be affected by drift in a manner that is difficult to predict from the MSB predictions.
ABSTRACT
PREDICTIONS FOR AN INFINITE...
APPLICATIONS TO A FINITE...
>MUTATIONAL MODELS AND ESTIMATION...
RESULTS
DISCUSSION
ACKNOWLEDGEMENTS
LITERATURE CITED
Mutational models and parameters:
A model of deleterious mutations was assumed in which the fitnesses (viability) of the wild homozygous, the heterozygous, and the mutant homozygous genotypes were 1, 1 sh, and 1 s, respectively. Homozygous effects for new mutations were sampled from a gamma distribution with shape parameter ß and mean effect
. Values of ß larger than one were used to avoid nonzero density functions for s = 0. Two widely different mutation rates per chromosome per generation,
= 0.2 and 0.006, with corresponding mean deleterious effects of
and 0.2, respectively, were used to cover the range of estimates supported by experimental evidence for chromosome II Drosophila viability (GARCíA-DORADO et al. 1999; LYNCH et al. 1999). A graph showing the gamma distributions used is given in Figure 1a. Lethal mutations were ignored, as the main interest is centered on the average dominance coefficient of nonlethal deleterious mutations (SIMMONS and CROW 1977).
|
The dominance coefficient of mutations ranged between 0 and 1 (i.e., over- and underdominant mutations were not considered) and was obtained in two ways. To use a distribution that gives zero density function for values of h = 0, we employed a convex beta distribution in most cases (Figure 1b). As the variance of dominance coefficients empirically observed by MUKAI (1969) for newly arisen mutations in D. melanogaster was
, a value of this order was used in the analysis. Dominance coefficients were assigned to n mutations either at random or with a negative correlation between s and h values. This correlation was established as follows: n values of s and n values of h were sampled from the corresponding distributions, and n pair values (x, y) were sampled from a normal bivariate distribution with a given correlation. For each variable, the ranking (from lower to higher) of the n values was determined. In the n (x, y) pairs, the variable values were replaced by their ranking to obtain a sample with n pairs of rank orders. For example, the first rank order pair is (1, 956) if the first x-value was drawn with the 956th y-value, etc. Then, s and h values were paired according to this ranking, so as to reproduce the rank order pairing obtained in the normal bivariate sample.
The second distribution of dominance coefficients was that proposed by CABALLERO and KEIGHTLEY (1994). Here, h values were taken from a uniform distribution between 0 and exp(ks), where k is a constant allowing the mean dominance coefficient of newly arisen mutations,
N, to be the desired value (Figure 1c). Note that partially dominant mutations are allowed with this distribution but only for low values of s. Thus, the model also implies a negative correlation between s and h values.
In most cases we assumed that all selection occurred at the viability level, so that the coefficients of selection and dominance of mutations were assumed to apply to overall fitness. However, some cases were considered where our interest is on the s and h values for a fitness component (viability), but mutants have a pleiotropic effect on other fitness traits. This is supported by results suggesting that most deleterious mutations have pleiotropic effects on all fitness components (see LYNCH and WALSH 1998, p. 345). Thus, we assumed an overall selection coefficient for fitness of s' = cs for cs < 1, and s' = 1 otherwise. Values of s' instead of s were used for all purposes related to the production of the chromosome samples, both through diffusion approximation or simulation, whereas s values were used to compute genotypic viabilities to obtain estimates. Values of c were taken from a uniform distribution between 1 and 3, implying that the deleterious effects on fitness of the mutations are larger than those on viability, and were assigned to specific mutations at random or assuming a given positive correlation between s and c. This correlation was established by the same procedure as that between s and h (see above). The rationale for a positive correlation between s and c is that mutations of large effect are usually found to be deleterious for several fitness components (HIRAIZUMI and CROW 1960; FERNáNDEZ and LóPEZ-FANJUL 1996). An average value of c = 2 was suggested by MUKAI (1969) and MUKAI and YAMAGUCHI (1974) for viability in D. melanogaster (see also CHARLESWORTH and HUGHES 1999). Note that, for large populations at mutation-selection balance, where homozygotes are rare, the above procedure is equivalent to assigning a coefficient c to the coefficient of dominance (h) rather than to the homozygous effect (s). This was the rationale used by MUKAI (1969) and MUKAI and YAMAGUCHI (1974) and would imply that each mutation presents higher dominance for its effect on fitness than for its effect on viability.
Diffusion approximations:
We used KIMURA's (1969) diffusion approximations under the infinite-sites model to obtain the equilibrium frequency distribution of a mutant with a specific selective effect and dominance coefficient in a population at mutation-selection-drift balance. Let q be the frequency of the mutant allele at a given locus affecting fitness (viability) and assume a randomly mating diploid population of N individuals, with an effective size Ne. The stationary distribution of mutant frequencies for 1/2N
q
1 1/2N in a nonrecurrent mutation model with infinite independent loci was given by KIMURA (1969) as
![]() | (10) |
is the haploid mutation rate per generation, G(q) = exp(2Nesq(q + 2h 2hq)), and
is the fixation probability of a mutant with initial frequency q.
The integral of
N(q) with respect to q gives the expected number of mutations segregating with frequency q within the integration interval. As only large N values are considered, for each possible allelic frequency [i.e., for q = 1/2N, 2/2N, 3/2N, ... , (2N 1)/2N] we use
![]() | (11) |
q+1/4Nq1/4N
N
dx, i.e., as an approximation to the expected number of independent loci with i segregating mutant copies and frequency q = i/2N at the mutation-selection-drift balance.
Now assume that a random sample of M chromosomes is taken from the population. The expected number of loci with j copies in the sample [
M(j)] can be calculated using binomial sampling,
![]() | (12) |
Finally,
and
are the expected total numbers of segregating loci in the population and in the chromosome sample, respectively.
For each particular case and mutational model, 10 sets of 1000 mutations were sampled and their diffusion stationary frequencies (Equations 1012) were obtained. Fixation probabilities to be used in Equation 10 were calculated by numerical integration using Simpson's rule. For each set of 1000 mutations, the 1000 expected distributions of gene copy numbers in a sample of M chromosomes,
M(j), and the 1000 expected total numbers of segregating loci in the sample, SM, were stored. Chromosomes were then simulated in which a number of independent loci were segregating. The number of loci assigned was equal to the averaged SM for all sampled mutations in the set. The assignment of particular mutations to each locus was done by sampling with replacement from the 1000 mutations with a probability proportional to their SM value. The number of copies of the mutant allele at each locus was sampled from a distribution proportional to the corresponding
M(j), and the copies were randomly assigned to the chromosomes. The results given below refer to samples of M = 100 chromosomes from populations with sizes N = Ne = 103, 105, and 107. Some analyses were also done with a larger number of chromosomes (M = 400), but the results were similar and are not given. Homozygous and heterozygous fitnesses (viabilities) were calculated for all combinations of chromosomes [M homozygotes and M(M 1)/2 heterozygotes], assuming a between-locus multiplicative fitness model. The sampling of M chromosomes was repeated 100 times for each set of 1000 mutations and the estimates were averaged. The above procedure was followed for each of the 10 sets of 1000 mutations, and mean estimates and standard errors were obtained from these sets.
The different averages for the coefficient of dominance of newly arisen mutations (representing different MSB predictions, see Table 1) and their standard errors were obtained from 100 replicates of 10,000 mutations each or calculated from the expected distribution of s and h values.
Simulations:
These were carried out to check the diffusion results. A single random-mating population of N = 1000 individuals was set up with no initial genetic variation and run for 10N generations before the extraction of chromosomes. This should allow for a balance between mutation-selection and drift to be reached (KEIGHTLEY and HILL 1989). The simulation of genes was made by the use of binary masks and bit-step operators. The model and mutational parameters were the same as in the diffusion approach. Free recombination was allowed among loci. One hundred samples of 100 chromosomes each were taken from the population at the end of the process to obtain estimates in the same way as above. Ten simulations were run to obtain standard errors of estimates.
Estimation procedure:
The expected average coefficient of dominance of segregating mutations
was calculated as the overall average of h values weighted by their expected frequency in the population's stationary frequency distribution. Thus, for each sampled mutation i, its expected frequency in the population was calculated as
, and the average dominance was taken over all sampled mutations,
. To normalize multiplicative viabilities, estimates of regression involving heterozygous and homozygous chromosomes (by.x and bx.y) used log-scaled values, previously scaled by the average heterozygous viability. Estimates were obtained for all chromosomes and only for quasi-normal (QN) chromosomes, defined as those chromosomes with homozygous viability >60% of the average heterozygous viability, as the main interest is usually on the average dominance coefficient of mildly deleterious mutations (SIMMONS and CROW 1977). Calculations for the ratio estimate (Equation 9) used the mean homozygous (WO) and mean heterozygous (WI) values without scaling and considered all types of chromosomes.
Estimates from mutation-accumulation experiments:
To assess the bias due to selection for estimates from mutation-accumulation experiments, chromosomes were constructed where deleterious mutations accumulated for t = 20, 100, or 200 generations in lines of N = Ne = 1 or 2 individuals. The procedure followed for the construction of the mutation-accumulation chromosome lines was the following. For each line, a number of mutations Poisson distributed with mean 2N
t was sampled from mutational distributions corresponding to those from Tables 3 and 4. A transition matrix of (2N + 1) x (2N + 1) with elements
![]() |
|
|
In a mutation-accumulation experiment without selection, the regression by.x estimates the arithmetic mean h of newly arisen mutations weighted by the squared homozygous effects
, whereas the ratio estimates the arithmetic mean h of newly arisen mutations weighted by the homozygous effect (
N
; see MUKAI 1969; GARCíA-DORADO and CABALLERO 2000). ABSTRACT
PREDICTIONS FOR AN INFINITE...
APPLICATIONS TO A FINITE...
MUTATIONAL MODELS AND ESTIMATION...
>RESULTS
DISCUSSION
ACKNOWLEDGEMENTS
LITERATURE CITED
N
,
N
, and
N equal the harmonic mean, and
N
equals the arithmetic mean. The other results (non-italic type) correspond to estimates obtained from segregating populations, using diffusion approximations or simulations. The simulation results are generally in good agreement with the corresponding diffusion estimates.
|
For small populations, the estimates are generally larger than their MSB predictions to a variable extent that depends on the mutational parameters. The overestimation (up to 100%) is more evident for the case in the last column, where the average degree of dominance is smaller. The decrease in bias with increasing population numbers is apparent in most cases, showing little or no bias for N = 107 (but see below). As s and h are independent for newly arisen mutations, this bias should be ascribed to MSB predicting an excessive pervasiveness (Equation 1) for mutations with very small h values, as it ignores drift as well as selection against homozygotes in finite populations. For example, for a mutation with s = 0.001 and h = 0.1, the expected pervasiveness in an infinite population is 1/sh = 10,000 copies, whereas the true pervasiveness in finite populations would be 3108, 8821, and 9965 for populations with 103, 105, and 107 individuals. The corresponding pervasiveness for a mutation with s = 0.01 and h = 0.01 would be 879, 5193, and 9759, respectively. This shows that, in finite populations, the larger s, the lower the pervasiveness, as increasing s favors purging selection against homozygotes.
Additional sources of bias arise from the use of a between-locus additive model to predict properties of a trait with between-locus multiplicative effects. Thus, a 20% upward bias is observed for the ratio estimate when the multiplicative effect of many deleterious mutations causes a high depression associated with inbreeding (i.e., Table 2, columns 1 and 2, where Ne = 107). This bias is avoided using log-transformed viabilities (results not shown). In fact, the regression estimates (by.x and 1/bx.y) do not show this bias as they are estimated from log-transformed data. However, this transformation, in turn, causes some downward bias (up to
25%) in the cases in columns 3 and 4, where deleterious effects are large. This bias disappears when estimates are obtained in the real scale (results not shown). Note that, in the cases in columns 3 and 4, individuals rarely carry more than one deleterious mutation, so that the advantage of log-transformation in linearizing multiplicative effects is overcome by the disadvantage of distorting large effects.
The estimates of 1/bx.y behave well for cases where most genetic variance is due to segregating deleterious mutations with small effect (Table 2, columns 1 and 2), giving the arithmetic mean of dominance for newly arisen mutations
at large population numbers. However, 1/bx.y is more sensitive than by.x to the log-scaling effects discussed above when large deleterious effects are expressed (columns 3 and 4). Furthermore, computing 1/bx.y for quasi-normal chromosomes (the usual procedure) gives too high estimates, particularly for higher population sizes. This is equivalent to truncating the dependent variable (x in this case), so that we no longer have randomly sampled values of the dependent variable for each considered value of the independent one (y in this case). When doing so,
x is reduced in the same proportion as
xy, but in a greater proportion than
y, leading to a reduction of the regression by.x =
xy/
y and, therefore, to an increase of the 1/bx.y estimate.
Table 3 gives estimates when the coefficient of dominance of new mutations is beta distributed and s and h are negatively correlated (r
0.4). Again, estimates are generally above their MSB predictions for small population sizes and the difference tends to disappear with increasing ones. Furthermore,
N(1/s) is larger than
N, as weighting by 1/s increases the weight for large h due to the negative correlation r. In parallel,
N(s) and
N
and the corresponding estimates in segregating populations (by.x and 1/bx.y) decrease with a negative r, as more weight is given to mutations of low h. Finally,
N and the corresponding ratio estimate are not affected by a negative r, as they apply to unweighted h for newly arisen mutations. Unfortunately, only bounds can usually be computed for the ratio estimate (upper bounds if the true
N < 0.5), as the expected fitness of a genotype carrying no deleterious mutations is unknown. Biases similar to those shown in Table 2 due to the multiplicative effects and log-scaling are also seen in Table 3.
Figure 2 shows some parameters for the mutational models in columns 2 and 3 of Tables 2 (r = 0) and 3 (r = 0.4). The top shows the percentage of newly arisen mutations with different h values (line) and the corresponding percentages for segregating mutations (bars). If there is no correlation between s and h (darkly shaded bars) segregating recessive mutations are relatively more frequent than newly arisen ones, whereas the opposite occurs for dominant mutations. For a negative correlation between s and h (r = 0.4, lightly shaded bars), the effect is reversed, as more dominant mutations are associated with lower deleterious effects. These results are more apparent for cases of large average deleterious effects (right side). The bottom shows the average selection coefficient for mutations of different classes of h, comparing again newly arisen mutations (lines) and segregating ones (bars). The elimination of mutations implies that, for each h class, the average s for segregating mutations is lower than that for new ones.
|
Table 4 gives estimates obtained when the coefficient of dominance of new mutations h is uniformly distributed between 0 and a value exponentially decreasing with s (see Figure 1c). Because values of h = 0 are possible for any given s, MSB predictions involving the inverse of h (harmonic means) are null. However, since drift reduces the pervasiveness of mutations with h
0 below its MSB prediction, estimates from segregating populations are not too different from those obtained for the previous model with similar r (Table 3) unless the population size is very large. For this model, cases with a large rate of deleterious mutations (columns 1 and 2) produce equilibrium populations where almost all chromosomes have severely impaired viability when homozygous, so that there are virtually no chromosomes in the QN class for Ne
105. This is in disagreement with empirical observations, implying that such large rates of mildly deleterious mutations are incompatible with the model of dominance used in Table 4. However, selection through pleiotropic effects on other fitness components can cause an additional reduction in the deleterious frequency and a corresponding increase in the frequency of QN chromosomes (see below).
The behavior of 1/bx.y is similar to that reported in Tables 2 and 3, except for the extremely large estimates (a 400% upward bias) obtained in the last column of Table 4 for Ne = 103 and 105. We found that, in addition to the overestimation caused by finite population size, in this case there is an additional overestimation due to an increase in the genetic variance of outcrossed individuals
following the N = 100 bottleneck corresponding to the analyzed samples. This increase in
2y after bottlenecking is expected in situations when a large proportion of the genetic variance is due to dominance deviations (ROBERTSON 1952), as is the case for the parameters in column 4 (see also Tables 4 and 5 in GARCíA-DORADO 2003). The reason is the increased expression of recessive deleterious mutations due to the increased homozygosity caused by sampling. This bias increases up to 30% with an additional bottleneck caused by resampling 100 chromosomes from the original 100-chromosome sample (results not shown), equivalent to one generation of maintenance in the laboratory before the chromosome analysis. This warns against inadvertent overestimation due to the original sampling procedure in experimental assays. This sampling bias is smaller for QN chromosomes, which, by definition, exclude all chromosomes with low homozygous viability.
|
Table 5 shows estimates of the average coefficient of dominance assuming that mutations affecting the trait of interest have an overall negative pleiotropic effect on global fitness. This is incorporated as a total mutational fitness effect s' = cs, where c is a random variable with uniform distribution between 1 and 3, either uncorrelated to s values or with a positive correlation of 0.5 for the normal bivariate used to establish the correlation. A comparison can be made between these estimates and the corresponding ones in Tables 3 and 4 (which would correspond to c = 1). If s and h are uncorrelated, there is also no correlation between c and h values and, therefore, estimates of the average h are not expected to be affected by pleiotropy. In contrast, when s and c are correlated, there is also a negative correlation between h and c implying a lower frequency of mutations with low h. Thus, estimates of the average h are expected to increase. This is observed in Table 5, but only by a negligible amount, indicating that for moderate correlation between the effects for different fitness components, the impact of pleiotropy on the estimation of h is very small. However, the increased efficiency of selection causes higher frequencies of QN chromosomes, which is of particular interest under the model in Table 4. Thus, under this model the proportion of QN chromosomes is 60 and 46% for r(s, c) = 0 and 0.5, respectively (Table 5B).
Finally, Table 6 shows estimates of the average coefficient of dominance from mutation-accumulation experiments. The regression of heterozygous on homozygous chromosomes, by.x, and the ratio of mean viabilities are given for chromosomes assumed to have accumulated mutations for 100 generations (results for 20 and 200 generations were similar and are not shown). Because rates and effects of mutations are estimated in mutation-accumulation experiments, we can decide, depending on the latter, what scale should be used for estimating the average coefficient of dominance. Thus, we used a log scale for models of large mutation rate (columns 1 and 2) and a real scale for models of low mutation rate and large deleterious effects (columns 3 and 4). The mutation lines were assumed to have one individual (such as in hermaphroditic or selfing species) or two individuals (such as in full-sib lines). For a mutational model of low homozygous effect (columns 1 and 2) there is almost no bias due to selection. However, for mutational models of large average effect (columns 3 and 4), purging selection induces an overestimation of the predictions. Nevertheless, the relative bias caused (i.e., the ratio "bias/theoretical prediction for the estimate") is usually below that of most estimates from segregating populations (Tables 3 and 4) unless the effective size of the segregating population is very large (Ne
105). The ratio estimate has expected value
N
, so that it can be directly compared to the 1/bx.y estimate from segregating populations, which behaves in a more erratic way due to opposing biases.
|
ABSTRACT
PREDICTIONS FOR AN INFINITE...
APPLICATIONS TO A FINITE...
MUTATIONAL MODELS AND ESTIMATION...
RESULTS
>DISCUSSION
ACKNOWLEDGEMENTS
LITERATURE CITED
The biases from finite population size and log scaling:
Finite population size:
The study has focused particularly on the finite size of populations, a key factor that has not been considered before. On the basis of MSB predictions, estimators from segregating infinite populations would provide inferences on the average coefficient of dominance of newly arisen mutations (h) weighted in different ways by their selection coefficients (s; see Table 1). Our results show that, in finite segregating populations, the expected values for those estimators may be substantially above the MSB predictions, so that they provide overestimated inferences of dominance for newly arisen mutations. The reason is that, for mutants with low sh values, the MSB prediction of the pervasiveness (1/sh; GARCíA-DORADO et al. 2003) is larger than the actual pervasiveness in finite populations. In other words, the number of copies in the population for mutants with low sh is expected to be much lower in a finite population than in an infinite one. Even for models excluding completely recessive gene action (Tables 2 and 3), the relative overestimation when Ne = 103 can be up to 75% of the MSB prediction for the regression estimates. Although the bias for those models is small when Ne
105, it can be substantial for any population size when the distribution of h allows for effectively recessive deleterious mutations (see Table 4), to the point that MSB predictions become inappropriate and do not allow any inference on the distribution of h for new mutations. Thus, segregating recessive deleterious mutations (h = 0) would have infinite pervasiveness in infinite populations but are lost by drift or purged by selection against homozygotes in small ones. In general, large empirical estimates of the average coefficient of dominance from natural populations could be explained, to some extent, by the population having small long-term effective population size instead of by gene action for new mutations being close to additive.
The importance of the distribution of h:
The bias of inferences on newly arisen mutations due to finite population size depends heavily on the distribution of dominance values. We have used two completely different distributions of h values. One, a beta distribution, has the practical advantage of having a density of zero for values of h = 0, which is useful to check the fit between estimates and MSB predictions based on the harmonic mean of h. The variance of h values assumed for this distribution was around the empirical estimate obtained by MUKAI (1969)
, and intermediate negative correlations between s and h values were introduced (r
0.4). In a second distribution, first proposed by CABALLERO and KEIGHTLEY (1994), and supported by some experimental evidence, the expected h exponentially decays for increasing s, causing a negative correlation [r(s, h) between 0.25 and 0.58; see Table 4], but with a substantial dispersion of h values. This distribution represents a situation where deleterious mutations of any effect can occasionally be completely recessive and illustrates how in those situations strict MSB predictions involving the harmonic mean will never apply. This model seems to be incompatible with high rates of even mildly deleterious mutations (Table 4, columns 1 and 2), as the accumulation of recessive ones renders QN chromosomes highly improbable. However, the inclusion of pleiotropic effects on other fitness traits reduces the accumulation of such mutations producing a sufficient frequency of QN chromosomes.
A relatively similar distribution was used by DENG and LYNCH (1996), DENG (1998), and DENG et al. (2002), where the h values were given by h = 1/2exp(ks), so that h was almost completely determined by s and the correlation between s and h was close to 1. Using this dominance model, DENG (1998) investigated the predictive value of the regression by.x and an alternative analogous method proposed by him (see above). He showed that these methods give underestimations of
N, even in the absence of overdominance. He did not compare, however, the estimators with their MSB predictions. If this were done, the predictions would have been seen to be fairly accurate for two reasons. First, the distribution of h used by Deng allows for little variance of h (
under his mutational models), so that the harmonic mean of h for newly arisen mutations is very close to the arithmetic one (
N
E). Second, the simulations involved infinite population sizes, so that there is no overestimation from random loss of virtually recessive mutations due to drift. To check this, we ran the simulations of the second column of Table 4 (
= 0.2,
, analogous to the mutational parameters analyzed by Deng) assuming his distribution of h values. The estimates obtained by diffusion approximations for a population of intermediate size (N = 105) were very accurate [
, by.x = 0.28, ratio = 0.40, 1/bx.y = 0.32 for QN chromosomes match the corresponding MSB predictions from Table 1:
N(1/s) = 0.49,
N(s) = 0.28,
N = 0.38, and
, respectively]. The correlation between s and h values was very tight (r = 0.98), and the variance of h values
was much lower than that empirically obtained by MUKAI (1969)(
). Thus, the distribution of h values is a critical factor determining the magnitude of the bias due to finite population size discussed in this article, which may not be detected when the assumed distribution gives too little variation for h.
The scaling bias:
Another source of potential bias is the fact that predictions are based on an additive model but are applied on data generated with multiplicative fitnesses across loci. The use of untransformed data for multiplicative fitness traits induces positive bias in the estimates of up to 50% for the ratio estimate when there are many segregating deleterious mutations per individual. This source of bias can be corrected by using log-transformed data. However, log-scaling induces a negative bias (up to 50%) on regression estimates (particularly on 1/bx.y) when deleterious effects are large (with mutations of small effect the bias is noticeable but almost negligible; see Tables 24 and DENG 1998). Thus, the decision to log-transform data to obtain regression estimates will depend on the mutational parameters, which can pose a problem of circularity when there is no previous information in this respect.
The estimators and their applications:
The estimator by.x:
The most extensively used method to estimate the average dominance coefficient of deleterious mutations is the regression of heterozygous on homozygous genotypic effects (by.x) for chromosomes extracted from segregating laboratory or natural populations (MUKAI 1969; MUKAI et al. 1972; MUKAI and YAMAGUCHI 1974; WATANABE et al. 1976; EANES et al. 1985; HUGHES 1995; JOHNSTON and SCHOEN 1995; CABALLERO et al. 1997). This estimate infers
N(s), the harmonic mean of h for newly arisen mutations weighted by s (Equation 5). It usually involves only quasi-normal chromosomes, inferring
N(s) of nonsevere new mutations, which can be considerably larger than that for all (nonlethal) mutations depending upon the parameter and model (see last column in Table 4). SIMMONS and CROW (1977) review these estimates, the most common values being
0.2.
For the particular, as well as unlikely, situation where s and h are uncorrelated, the pervasiveness of a mutation is inversely related to its dominance coefficient, so that the average h value for segregating populations is smaller than that for newly arisen mutations; i.e.,
E <
N. At mutation-selection balance
, and the former biological inequality has been translated to the algebraic fact that the harmonic mean is usually below the arithmetic one; i.e.,
N <
N (SIMMONS and CROW 1977). In this situation, by.x would estimate the arithmetic mean of h for segregating mutations
and would infer the harmonic mean for newly arisen ones (
N; see Table 1). The statement that by.x estimates
has been consistently repeated in the literature (MUKAI 1969; MUKAI et al. 1972; MUKAI and YAMAGUCHI 1974; WATANABE et al. 1976) but caution is needed because of the likely correlation between s and h, the classical value of
being
0.2.
MUKAI (1969) reconciled these relationships by considering that the arithmetic mean of h for newly arisen mutations is
with variance
, as deduced from mutation-accumulation experiments. He used a truncated normal distribution and a beta distribution of h values with these parameters to show that the harmonic mean is 
N = 0.2, in agreement with observed estimates of by.x. The calculations of MUKAI (1969) are correct, but imply no correlation between s and h. Figure 2 illustrates that, when that correlation is negative, selection against mutations with high h value is less intense, so that the harmonic mean of dominance coefficients for newly arisen mutations (
N) cannot be generally considered a good predictor of the arithmetic mean of h for segregating genes
and it generally occurs that
E >
N (Table 3). Furthermore, the regression by.x is smaller than
E due to the weighting by s2, particularly for models of low average h. For example, for the last column in Table 3, by.x = 0.07 for N = 105 for quasi-normal chromosomes, whereas
. Therefore, the relationship between by.x and
E depends largely on the joint distribution of h and s values.
The estimator 1/bx.y:
Because at mutation-selection balance the additive variance of a population is proportional to the arithmetic mean of sh values (MUKAI et al. 1974), 1/bx.y might be thought to be an appropriate estimator of dominance. However, 1/bx.y estimates the arithmetic mean of h weighted by s, which is different from the mean of sh. In addition, 1/bx.y is very sensitive to certain sources of bias. First, 1/bx.y estimated from quasi-normal chromosomes incurs an important upward statistical bias due to the truncation of the dependent variable. It also shows a high sensitivity to large effects due to log-scaling. Furthermore, our results warn against the bias in 1/bx.y caused by bottlenecking during the initial sampling. When the original population has substantial dominance variance caused by the presence of many deleterious recessive mutations segregating at low frequency, even relatively large samples can show increased genetic variance due to increased frequency of homozygous deleterious mutations. This causes a substantial increase in the variance of outcrossed individuals and, therefore, an upward bias for 1/bx.y of up to 400%.
MUKAI and YAMAGUCHI (1974) investigated the effect of overdominance or other sources of balancing selection on the estimates of by.x and 1/bx.y. They showed that, with overdominant genes, by.x becomes smaller (see also DENG 1998) and 1/bx.y becomes larger than the corresponding values due only to partially dominant genes. The amount by which 1/bx.y is inflated is much larger than the amount by which by.x is reduced. Thus, with overdominance one would expect that the estimates of
N from 1/bx.y would be highly biased upward. The empirical results obtained with this method often give very large estimates (of the order of one or more; see, e.g., MUKAI and YAMAGUCHI 1974; WATANABE et al. 1976), suggesting that some of the genetic variability could be maintained by balancing selection. However, other biases cannot be excluded, in particular bias from truncation, as those estimates were computed for QN chromosomes. Therefore, the previous conclusion should be treated with caution.
A method of estimation that combines by.x and 1/bx.y was used by HUGHES (1995)(see also CHARLESWORTH and HUGHES 1999) and is based on the square root of the ratio of the variance of heterozygotes to homozygotes,
. This is equivalent to
and predicts the geometric mean of the arithmetic and the harmonic average h weighted by s,
. This estimate is obviously more complicated to interpret than its individual components, and its sources of bias are in line with those previously discussed (these can be inferred from Tables 25).
The estimator of the ratio:
The ratio of the loads for outbred and inbred populations is the only estimate that is not affected by the correlation between s and h, as it estimates the harmonic mean of h values without weighting. Because of this, and because the inbreeding load is a function of the inverse of the harmonic mean of h values (MORTON et al. 1956), this method, applied to log-transformed data, seems an appropriate estimator. However, for absolute viability, the ratio estimate is computed by assuming viability of 1 for a genotype free of deleterious mutations (Equation 9). Considering nongenetic sources of mortality (LYNCH and WALSH 1998; GARCíA-DORADO et al. 1999), the true mean viability of a mutation-free genotype should be <1 if the true average dominance is <0.5, and the ratio estimate is expected to be an overestimation. Empirical estimates have been obtained for a range of vertebrates using data with different levels of inbreeding, giving an average of 0.08 ± 0.01 (LYNCH and WALSH 1998). However, those data included lethal and semilethal mutations, which usually show nearly recessive gene action (SIMMONS and CROW 1977), so that they should underestimate the value for nonlethal mutations. These counteracting biases make the ratio estimates difficult to interpret.
General implications and conclusions:
As discussed above, most of the biases shown in this study point toward overestimations of the average coefficient of dominance over their MSB predictions. This may have important consequences for many biological phenomena on which the average h is a fundamental parameter, such as the predictions on the evolution of selfing rates (e.g., CHARLESWORTH and CHARLESWORTH 1998, 1999), the evolution of sexual reproduction and genetic recombination (e.g., CHASNOV 2000; AGRAWAL and CHASNOV 2001; OTTO 2003), the prediction of the bottleneck effects on genetic variance (e.g., WANG et al. 1998), the support for different models of selection to explain genetic variation for life-history traits (e.g., CHARLESWORTH and HUGHES 1999; RODRíGUEZ-RAMILO et al. 2004), the maintenance of genetic variation (e.g., ZHANG et al. 2004), etc. For example, increased allocation to sexual reproduction is more likely when deleterious mutations are very recessive (CHASNOV 2000), particularly in structured populations (AGRAWAL and CHASNOV 2001; OTTO 2003). If the true average h is lower than current estimates suggest, support for the mutational theory of the evolution of sex would be stronger than previously assumed.
As an example of how sensitive inferences from experimental data are, we can use estimates computed by KUSAKABE and MUKAI (1984) for the viability of QN chromosomes II in a natural population of D. melanogaster (Ne
3000), whose variability was satisfactorily accounted for by MSB predictions (but see CHARLESWORTH and HUGHES 1999). They obtained by.x = 0.18 (0.13 for inversion-free chromosomes), but Figure 1 in KUSAKABE and MUKAI (1984) shows that moderate to severe deleterious effects were expressed only in homozygotes, implying that considerably smaller by.x estimates would have been obtained for the whole set of nonlethal chromosomes and suggesting low average h and strong negative correlation between s and h. They also obtained 1/bx.y = 0.67 (0.60 for inversion-free chromosomes), which is higher than any of the 1/bx.y obtained from our simulated data. This latter result suggests that, even for this population whose variability could be satisfactorily accounted for by the MSB predictions, there may be substantial genetic variation for viability that is not maintained by the balance between deleterious mutation, selection, and drift, as would be the case if there were a few loci with overdominant effects for fitness. However, it should be noted that 1/bx.y was computed for QN chromosomes, so that its high value could well be ascribed to truncation of the dependent variable.
As a general guide, we can conclude that, even assuming that no genetic variability is maintained by balancing selection, the list of factors biasing the estimates is so large (the finite size of populations, the correlation between s and h, the different weighting factors, bottlenecking, scaling, truncation, etc.) that none of the estimators from segregating populations is very reliable. Furthermore, even if estimates were unbiased, their usefulness is limited by the lack of critical information. For example, depending on the joint distribution of h and s among new mutations, a given average h among segregating mutations may correspond to completely different values of h among new mutations. In addition, the possibility that balancing selection may take place as a source of variation for fitness components violates the assumptions of the estimators. Although we have shown in this article that the pleiotropic effects of mutations on other fitness components are not a main concern in the behavior of the estimators (Table 5), the possibility of antagonistic pleiotropic effects (e.g., mutations with a negative effect on some fitness components and a positive one on others) producing balanced polymorphisms is another issue that remains to be addressed.
For direct estimates of the average coefficient of dominance obtained from mutation-accumulation experiments, purging selection may induce some overestimation when the average homozygous effects are large (Table 6), but the relative bias is generally smaller than that observed for estimates from finite segregating populations. After appropriate scaling, selection among lines is the most important source of bias, but we used an extreme model where the loss of lines occurred through truncation. This is a very conservative approach, as the number of lines lost in mutation-accumulation experiments is usually small (e.g., KEIGHTLEY and CABALLERO 1997) or fully explained by random losses (CHAVARRíAS et al. 2001). Direct estimates of the average coefficient of dominance of deleterious mutations still suffer from problems associated with weighting. The estimate by.x gives an average of h values weighted by s2 and, if s and h values are negatively correlated, this can be substantially smaller than the unweighted average. A similar problem arises with the ratio estimate, which is weighted by s. Furthermore, the ratio estimate can be largely biased upward if control lines suffer from nonmutational declines in mean fitness (see GARCíA-DORADO and CABALLERO 2000), and, in general, it may be very sensitive to the estimation of fitness in the control lines. However, as the reference genotype used to compute the ratio estimate in mutation-accumulation (MA) experiments is that of the control population (the genetically uniform line from where MA lines were originated), the assumption of viability 1 for the free-deleterious genotype, used in segregating populations, is unnecessary in MA experiments. A further advantage of inference from MA experiments is that overdominant polymorphisms are typically not maintained because of the small population size, reducing the confounding effects of balanced polymorphisms.
ABSTRACT
PREDICTIONS FOR AN INFINITE...
APPLICATIONS TO A FINITE...
MUTATIONAL MODELS AND ESTIMATION...
RESULTS
DISCUSSION
>ACKNOWLEDGEMENTS
LITERATURE CITED












. (b) Distribution [g(h)] of dominance coefficients following a beta distribution with mean
; (2)
; (3)
; and (4)
.



or 0.2 and variance
or 0.02, respectively, and has a correlation with s values of r 
and variance
. Right, mutational model for
. Top, percentage of newly arisen (line) and segregating mutations (bars) for different classes of h. Bottom, average coefficient of selection of newly arisen (lines) and segregating mutations (bars) for different classes of h. Darkly shaded bars and solid lines, no correlation between s and h (r = 0); lightly shaded bars and dashed lines, r = 0.4. Results for newly arisen and segregating mutations are based on draws of 1,000,000 and 1000 mutations, respectively.