Abstract
Deleterious mutations are relevant to a broad range of questions in genetics and evolutionary biology. I present an application of the “biometric method” for estimating mutational parameters for male fitness characters of the yellow monkeyflower, Mimulus guttatus. The biometric method rests on two critical assumptions. The first is that experimental inbreeding changes genotype frequencies without changing allele frequencies; i.e., there is no genetic purging during the experiment. I satisfy this condition by employing a breeding design in which the parents are randomly extracted, fully homozygous inbred lines. The second is that all genetic variation is attributable to deleterious mutations maintained in mutationselection balance. I explicitly test this hypothesis using likelihood ratios. Of the three deleterious mutation models tested, the first two are rejected for all characters. The failure of these models is due to an excess of additive genetic variation relative to the expectation under mutationselection balance. The third model is not rejected for either of two logtransformed male fitness traits. However, this model imposes only “weak conditions” and is not sufficiently detailed to provide estimates for mutational parameters. The implication is that, if biometric methods are going to yield useful parameter estimates, they will need to consider mutational models more complicated than those typically employed in experimental studies.
The longdebated question of whether or not genetic variation in fitness primarily reflects contributions of lowfrequency deleterious alleles maintained by the balance between selection and mutation, or has a substantial contribution from variants maintained at intermediate frequencies by selection, is still unanswered.
Charlesworth and Hughes (2000)
DELETERIOUS mutations have been a focus of study in both genetics and evolutionary biology for most of the history of each field (Crow 1993). Muller (1950) argued that modern medicine would allow deleterious mutations to eventually accumulate to intolerable levels within the human species unless countered by “a rationally directed guidance of reproduction.” Muller’s claims initiated a prolonged debate with other geneticists, most notably Theodosius Dobzhansky (see Crow 1987). Since the MullerDobzhansky dispute, biologists have hypothesized that harmful mutations may be critical to a wide range of ecological and evolutionary phenomena. These include the evolution of sex, mating systems, and mate choice (Pamiloet al. 1987; Kondrashov 1988; Charlesworthet al. 1990b; Uyenoyamaet al. 1992); the evolution of recombination rates (Kondrashov 1984; Palsson 2002); the persistence of small populations (Lande 1994; Lynchet al. 1995; Kondrashov 1995); and the maintenance of both molecular and quantitative trait variation (Lewontin 1974; Turelli 1984; Charlesworthet al. 1993). The validity of these hypotheses depends not only on the rate that deleterious mutations occur, but also on the magnitude of their effects, on dominance relations, and on how deleterious alleles at different loci combine to influence the overall fitness of an individual.
The primary mutational parameters are U, h, and s. The genomic deleterious mutation rate, U, equals 2Σμ_{i}, where μ_{i} is the deleterious mutation rate at the ith locus, and the summation is taken over all loci affecting fitness. The selection coefficient, s, is the proportional fitness reduction caused by the mutation when in homozygous form. The dominance coefficient, h, characterizes mutational effects in heterozygotes (see Table 1). A mutation with s = 0.05 and h = 0.2 reduces fitness by 1% in heterozygotes and 5% in homozygotes. As mutational effects are likely to vary among loci (and among different mutations at the same locus), estimates for s and h may actually represent average values for mutational effects. However, because the values of h and s for a given allele are assumed to be constant, this is essentially a model of unconditionally deleterious mutations. Given the range of applications outlined above, accurate estimates for U, h, and s will shed light on a number of unsolved problems in evolutionary biology.
At least three different approaches have been developed to estimate the rate and effects of deleterious mutations: mutationaccumulation experiments, molecular evolution surveys, and biometric analyses. These methods, briefly summarized below, each have their respective strengths and weaknesses. They are applicable to different sorts of data. The procedures also differ in the specific parameters that are estimated and in their underlying assumptions regarding genetics and evolution.
Mutationaccumulation experiments estimate mutational parameters from the observed genetic divergence of initially identical lines (Muller 1928; Bateman 1959; Mukaiet al. 1972). These lines are maintained at small population sizes and diverge due to the random accumulation of new mutations. Estimates for mutational parameters can then be extracted from the rate and overall pattern of divergence through a variety of statistical procedures (Keightley 1994; GarciaDorado 1997; Lynch and Walsh 1998; Shawet al. 2002, and references therein). Mutation accumulation is the most direct of the estimation methods and these experiments have provided most of our information regarding mutational parameters (reviewed by Simmons and Crow 1977; Lynch and Walsh 1998, pp. 343348). Unfortunately, mutation accumulations are highly labor intensive. The parameter estimates are subject to large statistical uncertainty. Perhaps most importantly, mutationaccumulation experiments may fail to detect mutations with small effects on fitness (Keightley and EyreWalker 1999; Lynchet al. 1999). As a consequence, U may be substantially underestimated and s substantially overestimated.
The molecular survey method is based on the rates of synonymous and nonsynonymous gene sequence evolution within an evolutionary lineage (Kondrashov and Crow 1993; EyreWalker and Keightley 1999). If synonymous mutations can be treated as selectively neutral, an estimate for U can be obtained from the deficiency of nonsynonymous substitutions (relative to synonymous substitutions) within the genome as a whole. Molecular surveys measure the timeintegrated effects of selection (as opposed to the direct effects of mutations on viability and fertility) and can thus potentially detect very weakly selected mutations. A second advantage is that inferences are based on the actual historical results of natural selection (genetic substitutions). In contrast, the other methods are based on measurements of fitness surrogates (e.g., viability, female fertility, male mating success, etc.) that are usually measured under controlled laboratory conditions. On the negative side, the molecular survey analysis depends on several unrealistic assumptions. In particular, the method assumes that adaptive amino acid substitutions do not occur and that all synonymous changes are strictly neutral; i.e., there is no selection on codon usage (Akashi 1997). EyreWalker and Keightley (1999) argue that deviations from these assumptions will lead to underestimates for U. As a consequence, molecular surveys may provide a robust lower bound for this quantity.
Biometric methods use patterns of phenotypic variation in fitness components to infer mutational parameters (Mortonet al. 1956; Charlesworthet al. 1990a; Deng and Lynch 1996). This approach is grounded in population genetic models of mutationselection balance (e.g., Haldane 1927). Biometric quantities, such as the amount of genetic variation in fitness and the magnitude of inbreeding depression, can be expressed as functions of U, h, and s. Estimates of inbreeding depression or the genetic variance can thus be used to estimate mutational parameters by inverting the population genetic equations. For example, Deng and Lynch (1996) derive simple estimators for U, h, and s from the means and genetic variances of inbred and outbred individuals within an experimental population.
Biometric methods are generally less labor intensive than mutationaccumulation experiments. More importantly, these experiments can potentially yield estimators that are relatively low in bias for a wide range of mutational parameters (Deng 1998; Denget al. 2002). However, like the two previous methods, biometric methods are also encumbered with potentially limiting assumptions. Two particularly important assumptions are (1) that all genetic variation in fitness is maintained through mutationselection balance and (2) that experimental inbreeding changes genotype frequencies without changing allele frequencies. Assumption 1 will break down if balancing selection of some form (e.g., heterozygote advantage, frequencydependent selection, spatial and/or temporal variation in selection, genotypebyenvironment variation, etc.) maintains a substantial fraction of the genetic variance in fitness. If this is so, biometric quantities may deviate substantially from their expected values given U, h, and s. Assumption 2 breaks down if deleterious alleles are eliminated or “purged” from the experimental population so that the frequencies of deleterious mutations differ between inbred and outbred individuals. This will bias parameter estimates.
This article describes a biometric analysis of male fitness measures in Mimulus guttatus, a wildflower commonly known as yellow monkeyflower. My purpose is to address both of the potential pitfalls associated with the biometric method. Mutational parameters are estimated from a breeding experiment that uses randomly extracted, highly inbred lines as parents. This effectively eliminates the problem of purging within the experiment. Allele frequencies should not differ (consistently) between outbred and inbred plants. Second, the experiment is designed so that the number of “empirical comparisons” is greater than the number of unknown parameters. This provides the degrees of freedom necessary to test the adequacy of a model prior to parameter estimation. Following the suggestion of Deng and Lynch (1996, p. 358), I use likelihood ratios to test the assumption that all genetic variation is maintained through mutationselection balance.
Biometric methods depend on specific features of the experimental design such as the type and levels of inbreeding, the sorts of experimental crosses, etc. In the following sections, I first describe the experimental design and define the biometric quantities that can be estimated from this design. I then derive the expected values for these quantities under various forms of the deleterious mutation model (DMM). Next, I describe the likelihoodbased methods for parameter estimation. A parametric bootstrapping procedure is developed, both to assess the adequacy of the experimental design and to perform hypothesis tests (obtain an appropriate P value for the various likelihood ratios). Finally, the statistical model is applied to phenotypic data from M. guttatus. The characters are flower size and two male fitness measures, pollen production and the pollen size index (PSI). The PSI is strongly correlated with pollen viability in M. guttatus (Kellyet al. 2002).
BIOMETRIC QUANTITIES AND THEIR EXPECTED VALUES
Consider a breeding design in which the parents are randomly extracted, fully homozygous inbred lines. Such lines can be formed in several different ways. For select species, completely homozygous lines can be formed in a single generation via chromosomal manipulations, gametophytic selffertilization, or doublehaploid formation. Repeated rounds of inbreeding provide a more generally applicable, albeit slower, method of line formation. Successive generations of selffertilization can yield lines that are almost fully inbred in only five to six generations. The parental lines used in this study were developed by John Willis through six successive generations of selffertilization (Willis 1999a,b).
Regardless of the method, genetic purging is likely to accompany the formation of the inbred lines. Lines that become homozygous for mutations that are lethal or cause sterility in homozygous form will go extinct and eliminate these alleles from the experimental population. In contrast, deleterious mutations with smaller effects should fix randomly within lines (Willis 1999a,b). As a consequence, a genetic analysis using these lines as parents is informative only about the nonlethal and nonsterile fraction of mutations. Parameter estimates are thus specific to this class of mutations, termed “detrimentals” by Simmons and Crow (1977).
A major advantage of using fully inbred lines as parents in a biometric breeding design is that they are already fully purged. Subsequent inbreeding or outcrossing should result in predictable changes in homozygosity without changes in allele frequency (apart from the random changes due to finite sample sizes). Consider the breeding design described by Figure 1. Each “extended family” is constituted from the progeny of three randomly selected inbred lines (labeled L1, L2, and L3). One line (L1) is assigned as the “sire” and is mated to the other two lines, the “dams.” The progeny of these crosses comprise two distinct but related outbred families (the families in Figure 1, bottom). Each line is also selffertilized to produce three inbred families. The inbred families are unrelated to each other, but each is related to one or both of the outbred families. This same mating scheme is applied to other sets of lines generating a large number of extended families.
The measurement of progeny from this design provides data sufficient to estimate the mean phenotype of both inbred and outbred plants as well as a number of (co)variance components. Let β denote the difference in mean phenotype between inbred and outbred plants. Four different comparisons among relatives can be quantified as covariances (Figure 1). C_{SS} is the covariance among plants within selfed families (selfed siblings). C_{FS} is the covariance among plants within outbred families (full siblings). C_{OS} is the covariance between outbred plants and their inbred siblings (related through a single parent). Finally, C_{HS} is the covariance of plants from different outbred families (halfsiblings related through the sire line). These C terms are “observational variance components” in the terminology of Falconer (1989). The observational components can be expressed as a function of the “causal components” V_{A}, C_{AD}, V_{D}, and V_{DI} (see Cockerham and Weir 1984; Shawet al. 1998; Kelly and Arathi 2003). I do not use these relationships here. Instead, the observational components are expressed directly as functions of the mutational parameters (see Equations 1a, 1b, 1c, 1d, 1e, 2a, 2b, 2c, 2d, 2e, 3, 4, 5, 6a, 6b, 6c, 6d, 6e, 7 below).
Estimates for β and the observational components are used to assess the sufficiency of a particular deleterious mutation model to explain genetic variability. If demonstrated as sufficient, the same data can then be used to estimate the mutational parameters of that model. For both purposes, it is necessary to express β, C_{SS}, C_{FS}, C_{OS}, and C_{HS} as functions of the mutational parameters. This requires a model for genotypic effects on both phenotype and fitness. Table 1 summarizes the standard fitness model for an individual locus: the deleterious allele (a) exists at population frequency q. Following Charlesworth and Hughes (2000), I distinguish the evolutionary fitness of a genotype (determined by h and s) from its contribution to a measurable fitness component. The quantity α “translates” between phenotype and fitness.
Predicted values for β, C_{SS}, C_{FS}, C_{OS}, and C_{HS} can be obtained by applying quantitative genetic models developed for inbreeding populations (Cockerham and Weir 1984; see also Harris 1964; Jacquard 1974). As these quantities depend on the contributions of many loci, it is necessary to subscript the various quantities in Table 1. As shown in appendix a,
Equations 1a, 1b, 1c, 1d, 1e depend on two important assumptions. The first is that the loci contribute additively to the phenotype. Oftentimes, it is assumed that the effects of deleterious mutations at different loci combine multiplicatively. In this situation, trait values should be logtransformed prior to analysis. As the appropriate scale is not obvious a priori, here I apply the methods to both logtransformed and untransformed data (see results and discussion). Second, each component is linear in q_{i} because I have neglected terms of order q ^{2}_{i} (see appendix a). This approximation should be quite accurate because deleterious alleles will not usually obtain high frequencies in a large population.
The frequency of deleterious alleles at a given locus (q) depends on the mutational parameters. I now assume that the natural population is large, randomly mating, and in mutationselection balance equilibrium. I further assume that deleterious mutations have some phenotypic effect in heterozygotes (h > 0), an assumption supported by genetic data (Simmons and Crow 1977). As shown by Haldane (1927), the expected frequency q_{i} of a partially recessive mutation is equal to μ_{i}/(h_{i}s_{i}). Substitution of this expression into Equations 1a, 1b, 1c, 1d, 1e yields predictions for the biometric quantities in terms of mutational parameters.
The final step is to characterize the mutational spectrum, the joint density function of α_{i}, h_{i}, and s_{i} across loci. I consider three alternative models. In model I, mutational effects are fixed and do not vary across loci. In model II, mutational effects vary according to an exponential distribution. Model III considers variable mutational effects without imposing distributional assumptions.
Model I: All new mutations have fixed effects α, h, and s. With constant mutational effects,
The five estimable biometric quantities are functions of only three unknown quantities: Uα, h, and sα. Because both U and s are always included in a product with α, neither parameter can be estimated individually (unless a particular value for α is assumed). Only the composite parameters, Uα and sα, can be estimated. The fact that the five measurable statistics are a function of only three unknowns allows us to test the sufficiency of model I. The model imposes the constraints that C_{FS} = 2C_{OS}^{2}/C_{SS} and C_{HS} = ½C_{FS}.
Model II: New mutations vary in their effects according to a simple probability model. Following Deng and Lynch (1996), model II posits an exponential density for s,
The parameters of model II are U, K, s, and α. I assume that α_{i} is the same across loci. The average dominance of new mutations, h, is
Model III (weak conditions): Model II represents only one way in which mutational effects may vary across loci (or between different mutations at the same locus). A range of alternative models could be considered. For example, Caballero and Keightley (1994) consider a model that allows variable dominance of mutations, even if they have the same value for s. Other generalizations are also possible (Zhanget al. 2002 and references therein). Given the range of possibilities, it is fortunate that we can make a few statements about the relative magnitudes of the biometric quantities without an explicit model of mutational effects. As long as deleterious alleles are rare and h_{i} < 0.5 at all loci, then Equations 1a, 1b, 1c, 1d, 1e imply that C_{OS} < ½C_{SS}, C_{FS} < C_{OS}, and C_{HS} = ½C_{FS}. Model III is not sufficiently detailed to allow parameter estimation. However, a model with these constraints can be tested against the unconstrained model. The conditions associated with model III are “weak” because they may often hold even if a substantial portion of the variance in fitness is maintained by balancing selection.
STUDY SPECIES AND METHODS
M. guttatus (2n = 28; Scrophulariaceae) is a small wildflower, increasingly used as a model organism for genetic studies of floral variation and the evolution of plant mating systems (Macnair and Cumbes 1989; Ritland 1989; Carr and Fenster 1994; Fenster and Ritland 1994; Robertsonet al. 1994). It occurs throughout western North America and local populations may be either annual or perennial. The plants of this study are derived from Iron Mountain, which is a large, annual (or winter annual) population located in the Cascade Mountains of central Oregon (see Willis 1996, 1999a,b). M. guttatus is a selfcompatible, but the Iron Mountain population is predominantly outcrossing (Willis 1993b).
John H. Willis initiated ∼1200 independent lines of M. guttatus in August of 1995. Each line was founded from the seed set of a separate fieldcollected plant from Iron Mountain. Each line was subsequently maintained in the greenhouse by singleseed descent (selffertilization) for six generations. Six successive generations of selffertilization yield an inbreeding coefficient of >0.98 and I treat the parents as fully inbred (f = 1) for the purpose of calculating covariances among relatives. As expected, these lines are almost completely homozygous at highly polymorphic microsatellite loci with different lines fixed for different alleles (Willis 1999a).
Three hundred of these sixthgeneration lines were used to found 100 extended families following the experimental design of Figure 1. A single plant from the sire line was grown to maturity, selffertilized, and used as a pollen source for a single plant from each of the dam lines. Each of the dams was also selffertilized. A maximum of eight pots were seeded per family (40 per extended family) on March 9, 2001 in the University of Kansas greenhouses. At 2 weeks after seeding, all pots were thinned to a single plant (randomly selected) and were subsequently fertilized once per week. A series of morphological measurements were taken on the first flower produced by each plant (see Kelly and Arathi 2003) and the anthers were collected into a microcentrifuge tube. Flowering was monitored daily until 60 days postseeding, by which time the great majority of plants had flowered. In total, 2345 plants were measured (1113 outbred and 1232 inbred). The distribution of plants across families was not balanced and many extended families were completely missing one or more of their component families. This is relevant to the analysis. Maximum likelihood accommodates unbalanced designs better than leastsquares or “ANOVAstyle” methods do (Searleet al. 1992).
Pollen number and the PSI were estimated for each sample using a Coulter counter model Z1 dual (Coulter, Miami). The microcentrifuge tubes were left open for 1 week after pollen collection to allow the anthers to dry and release their pollen. Each sample was then diluted into 20 ml of electrolyte solution and run through the Coulter counter. The machine counts the number of particles between two size thresholds (10 and 25 μm) and the number of particles that are larger than the upper threshold (>25 μm). The PSI is the proportion of grains in the upper size category. The size thresholds were set on the basis of results from a previous study showing a strong positive correlation between PSI and pollen viability as measured through standard staining techniques (Kellyet al. 2002).
The distributions of pollen number and PSI within both inbred and outbred plants are given in Figure 2. A genetic analysis of the floral morphology is presented elsewhere (Kelly and Arathi 2003). One floral trait, corolla width, is included here as a comparison to the male fitness measures (outbred plants, mean = 19.13 mm, SD = 2.65; inbred plants, mean = 17.32, SD = 2.95). For analysis, the original measurements were divided by the outbred mean value for that trait. This linear transformation is commonly used in biometric studies as it allows comparison of parameter estimates for traits that differ in the magnitudes of their measurement values (Houle 1992; Charlesworth and Hughes 2000). We also considered logtransformed values for pollen number and PSI, as this may be more appropriate if loci combine multiplicatively.
HYPOTHESIS TESTING AND PARAMETER ESTIMATION
Maximum likelihood is used for both testing the various models and estimating parameters. By methods described below, I find the parameter values for each of the mutational models (I, II, and III) that produce the highest likelihood for the data. These values are the maximumlikelihood estimates (MLEs) for the parameters of that model. The sufficiency of that model is tested by comparing the maximumlikelihood value for the model with the maximum likelihood of the “unconstrained model.” The latter is based on direct estimates for the biometric quantities without the particular restrictions imposed by the various mutational models (see section on constraints below).
The resemblance of relatives may depend on environmental factors as well as genetic factors. These factors must be accounted for in order to infer genetic parameters from phenotypic covariances. Maternal effects are a potentially important source of resemblance among relatives. The variance of maternal effects, V_{M}, can be estimated from this design because the comparison between inbred and outbred sibs (estimating C_{OS}) involves both male and female parents (Figure 1). Also, the magnitude of environmental deviations may be greater for inbred individuals than for outbred individuals (Lerner 1954; Kelly and Arathi 2003). I thus estimate different environmental variances for outbred and inbred plants, V_{E(o)} and V_{E(i)}.
With regard to hypothesis testing, it is important to distinguish universal constraints from modelspecific constraints. The universal constraints apply to all of the models including the unconstrained model. These follow from basic features of inheritance, independent of allele frequencies or the nature of genetic effects. The universal constraints are that V_{E(o)} and V_{E(i)} must be positive; V_{M}, C_{SS}, C_{FS}, C_{OS}, and C_{HS} must be greater than or equal to zero; C_{HS} ≤ ½C_{FS}; and the magnitude of C_{OS} is limited by C_{SS} and C_{FS} (Kelly and Arathi 2003).
Subject to these universal constraints, MLEs were obtained for the unconstrained model by finding the set of values for M_{O}, β, C_{SS}, C_{FS}, C_{OS}, C_{HS}, V_{M}, V_{E(o)}, and V_{E(i)} that maximize l, the loglikelihood function. Here, M_{O} is the mean phenotype of outbred plants. The predicted mean phenotype of inbred plants is M_{O} +β. I use the standard multivariate normal model for the loglikelihood function,
The maximum of l was determined iteratively by using the first derivatives of l with regard to each parameter and the asymptotic dispersion matrix. Following Searle et al. (1992), the vector of derivatives of l with regard to each of the fixed effects is
Optimization was performed using two methods simultaneously, steepest ascent and scoring (Searleet al. 1992, Chap. 6; Eliason 1993). By steepest ascent, the direction of movement within the parameter space is simply the vector of first derivatives. The direction for scoring is the product of the dispersion matrix and the derivatives vector. A loglikelihood value was calculated at 10 points along each directional vector using Equation 8. The highest of these 20 likelihood values was chosen to determine the set of parameters for the next iteration. In most cases, the optimization routine used the scoring vector in the first few iterations and the steepest ascent vector close to the optimum. These calculations, as well as the simulations described below, were performed by a series of computer programs similar to those described in Kelly and Arathi (2003). These programs, written in the C programming language, are available from the author upon request.
The maximum loglikelihoods for each mutation model were also obtained using Equation 8, although with different values in the variancecovariance matrix, V. The V matrix is a function of four genetic parameters in the unconstrained model (C_{SS}, C_{FS}, C_{OS}, and C_{HS}), in addition to the environmental components. For model I, V contains only two genetic parameters, C_{SS} and h, in addition to the same set of environmental components. The various comparisons between relatives within V are functions of these two parameters: C_{OS} = hC_{SS}, C_{FS} = 2h^{2}C_{SS}, and C_{HS} = h^{2}C_{SS}. For model II, V also depends on only two genetic parameters: C_{OS} = h^{*}C_{SS}, C_{FS} is given by Equation 7, and C_{HS} = C_{FS}/2. For model III, the elements of V are functions of three genetic parameters (C_{SS}, C_{FS}, and C_{OS}) with the genetic covariance of halfsibs equated to C_{FS}/2. However, two of these parameters, C_{FS} and C_{OS}, are constrained in value: C_{OS} < C_{SS}/2 and C_{FS} < C_{OS}. These modelspecific constraints affect not only the numerical values within V, but also the derivatives of V with regard to each parameter [the (∂/∂θ)(V) matrices of Equation 10].
Models I, II, and III were tested by comparing the maximum likelihood obtained under that hypothesis (l_{0}) to the maximum likelihood obtained from the unconstrained model (l_{1}). The model was rejected if the likelihoodratio statistic 2(l_{1}  l_{0}) was greater than the appropriate critical value. It is generally assumed that, under the null hypothesis, 2(l_{1}  l_{0}) follows a chisquare distribution with the number of degrees of freedom equal to the number of parameters distinguishing the two models. This would suggest a critical value 5.99 (for P < 0.05) for models I and II because the unconstrained model has two more parameters than either deleterious mutation model. There is no obvious choice for model III because only one parameter is eliminated (C_{HS} = ½C_{FS}) while other parameters are constrained in value (C_{OS} < ½C_{SS}, C_{FS} < C_{OS}). These inequalities do not translate in a simple way into degrees of freedom. Even the critical values for models I and II may be suspect because the universal constraints apply to the unconstrained model (see Shawet al. 1998). For these reasons, I employ a parametric bootstrapping routine to evaluate the significance of likelihoodratio tests.
The parametric bootstrap involves repeated simulation of the data using the parameter estimates from the relevant mutational model (I, II, or III). Genotypic values for the parental lines are first sampled (from a normal distribution with variance C_{SS}) and the progeny groups are then formed conditional on these parental values (and the values of the various parameter estimates). The numbers of individuals in each family are equivalent to the actual sample sizes from the experiment. Once the simulated data set is created, l_{1} and l_{0} are determined by applying the maximumlikelihood programs for the unconstrained and deleterious mutation models, respectively. I simulated 1000 data sets per test and the various models were fitted to each data set. The distribution of values for 2(l_{1}  l_{0}), across simulated data sets, estimates the sampling distribution of the likelihoodratio statistic. The P value associated with a particular test is (M + 1)/(R + 1), where M is the number of simulated values that exceed the actual likelihoodratio value (calculated from the original data) and R is the number of simulations (Davison and Hinkley 1997, p. 148).
In addition to their use in hypothesis testing, the parametric bootstrapping programs can be used to estimate the joint sampling distribution of our ML estimators for any mutational model in this experimental design. These simulations thus provide a means to evaluate properties such as bias and sampling variance. The distribution of simulated data sets demonstrates the range of possible experimental outcomes, including values for the likelihoodratio statistic, when the null hypothesis is correct (when genetic variation conforms to the mutational model under consideration). I conducted simulation studies of both models I and II for a range of parameter sets to investigate (1) the null distribution of the likelihoodratio statistic and (2) the statistical properties of the estimators. Each simulated data set was equivalent in terms of sample sizes to the experimental study.
RESULTS
Simulation study: Table 2 summarizes a representative set of results from the simulation study. Cases 15 are simulations from models I, while cases 610 are from model II. Each simulated data set was first fit to the appropriate mutational model and then to the unconstrained model. The second column of Table 2 gives an empirically determined “critical value” for that set of parameters. Ninetyfive percent (950 of 1000) of the simulations yielded likelihoodratio statistics that were less than this value. For model I, the empirical critical values differed among parameter sets but were always less than the chisquare value of 5.99. For model II, there was also variation among parameter sets, but empirical critical values were somewhat higher than those for model I.
The maximumlikelihood parameter estimates for both models are approximately median unbiased (Table 2). They exhibit relatively low sampling variances. The procedures do occasionally fail to yield estimates when certain variance components approach boundary values. However, this occurred in only a small fraction of simulations (not at all for most parameter combinations) and these cases are excluded from Table 2.
Mimulus study: Parameter estimates with standard errors (SEs) are given in Table 3 for each trait under the unconstrained model. The SEs are derived from the asymptotic dispersion matrix. These are only approximate and significantly nonzero estimates are routinely within 2 SE of zero (Shawet al. 1998; Kelly and Arathi 2003). The inbred genetic variance (C_{SS}) is greater than the outbred variance (C_{FS}) for each trait, although the magnitude of the difference varies greatly. For each trait, C_{FS} > C_{OS}. This implies that the point estimates for these components violate even the weak conditions (model III). Finally, maternal effects are evident for flower size but have no apparent effect on the male fitness traits. The V_{M} parameter is thus eliminated from the model for subsequent analyses of male fitness traits.
Maximumlikelihood values for each trait under model I are far lower than their associated values under the unconstrained model (Table 4). Each likelihoodratio test is highly significant by both the parametric bootstrap and χ^{2}. A P value <0.001 for the parametric bootstrap implies that the observed likelihoodratio (LR) statistic was greater than any of the 1000 simulated values. The MLEs for each trait are also given. Confidence intervals could be established for these estimates by resampling (bootstrapping) over extended families. I do not bother to justify the parameter estimates here because the model on which they are based is rejected.
There is generally better fit of the data to model II, although the model is still rejected for all traits (Table 5). As expected, the likelihood ratios are much lower for model III (Table 6). In fact, genetic variation in both Ln(pollen number) and Ln(PSI) are statistically consistent with the weak conditions. For all of the models, the likelihood ratios are lower for Lntransformed male fitness values than for untransformed trait values.
DISCUSSION
This study clearly illustrates both the strengths and the faults of biometric estimates for mutational parameters. On the positive side, I was able to calibrate several different mutational models with an experiment of only moderate size (2345 plants). These models are surprisingly general, allowing variable mutational effects, maternal effects, and distinct environmental deviations for inbred and outbred plants. The parametric bootstrap simulations demonstrate that, if genetic variation is maintained according to one of the mutational models, the MLEs are surprisingly accurate (Table 2). Taken together, these observations suggest that biometric methods have the potential to yield accurate estimates for mutational parameters with much lower effort than is typically necessary for mutationaccumulation experiments.
The primary weakness associated with our parameter estimates is that they are based on models insufficient to explain the data. Likelihoodratio tests reject models I and II for each of the traits (Tables 4 and 5). This is not surprising for flower size, as a previous study demonstrated that this trait is inconsistent with a deleterious mutation model of genetic variation (Kelly and Willis 2001). However, it is unexpected that models I and II were rejected for the male fitness traits. Evolutionary analyses of lifehistory (fitness) traits frequently assume that genetic variation is maintained by mutationselection balance (Crow 1993; Houleet al. 1996; Deng and Lynch 1997). Our results call this assumption into question.
The rejection of mutationselection balance for male fitness traits is clearly tentative. Model III was not rejected for either Ln(pollen number) or Ln(PSI). This may reflect the fact that these “weak conditions” are consistent with a range of genetic architectures, at least within the scope of estimation error. However, it may also mean that variation is governed by a deleterious mutation model that is simply more complicated than models I or II (e.g., Caballero and Keightley 1994; Zhanget al. 2002). While our experiment is not sufficiently complex to both calibrate and test a more complicated mutational model, it is straightforward to expand the design to include more comparisons.
Models I and II fail because there is too much additive genetic variation in both flower size and male fitness traits. The genetic covariance of full siblings (C_{FS}) is equal to the additive variance when the parents are fully inbred. The genetic covariance of half siblings (C_{HS}) is equal to onehalf the additive variance (appendix a). Under the deleterious mutation models, the amount of additive variation relative to other components depends on the average dominance of deleterious mutations. For example, with h = 0.2 in model I, the variance among fully homozygous genotypes (C_{SS}) should be 12.5 times greater than the variance among outbred genotypes (C_{FS}) and C_{OS} should be 2.5 times greater than C_{FS}.
The C_{SS} estimates are substantially greater than C_{FS} estimates, particularly for logtransformed male fitness traits. However, C_{OS} is invariably less than C_{FS} when the unconstrained model is fit to the data (Table 3). The low estimates of C_{OS} relative to C_{SS} indicate that, if variation is attributable to rare alleles, these alleles must be fairly recessive on average. However, the more recessive they are, the less additive variation there should be. Estimating h just from C_{OS} and C_{SS} in Table 3, we would expect the additive variances of the male fitness traits (as measured by C_{FS} and C_{HS}) to be only ∼20% of their actual estimated values. An excess of additive genetic variation has also been documented for several lifehistory traits of Drosophila melanogaster and used to reject mutationselection balance as a sufficient explanation of variation (Mukai and Nagano 1983; Takanoet al. 1987; Charlesworth and Hughes 2000).
The higherthanexpected estimates for C_{FS} and C_{HS} also explain why the estimated dominance coefficients for male fitness traits under models I and II (Tables 4 and 5) are greater than those obtained previously. Willis (1999a) used a subset of these same inbred lines to estimate the average dominance of deleterious mutations by the “regression method” (Mukaiet al. 1972; Mukai and Yamaguchi 1974; Caballeroet al. 1997). He randomly paired lines and mated them to produce F_{1} families. The regression of F_{1} family mean trait values onto the corresponding sum of mean values for their parental lines provides an estimate for the average dominance of deleterious alleles. By this method, Willis (1999a) estimated the average dominance to be ∼0.1 for both pollen number and pollen viability (both measurements logtransformed). In contrast, our estimates of h (from model I) are about twice are large (Table 4) and the estimates of h (from model II) are 3.5 times greater (Table 5).
The comparison of F_{1} families with parental lines is actually contained within this design (Figure 1) and it is straightforward to show that the regression method is actually estimating C_{OS}/C_{SS}. If we calculate C_{OS}/C_{SS} from the estimates in Table 3, the values are quite close to those obtained by Willis (1999a): 0.13 for Ln(pollen) and 0.14 for Ln(PSI). In contrast, the maximumlikelihood estimates for h and h (Tables 4 and 5) depend not only on C_{OS} and C_{SS} but also on C_{FS} and C_{HS}. Maximum likelihood finds the best “compromise” given the relative magnitudes of all four variance components. This requires dominance coefficients to be adjusted upward to account for the higherthanexpected values for C_{FS} and C_{HS}. If intermediate frequency alleles make a substantial contribution to variation in fitness, the regression method (C_{OS}/C_{SS}) may actually provide a more accurate estimate for the dominance of rare alleles than the procedure used here (J. K. Kelly, unpublished results).
Assumptions: Models I and II employ an equilibrium model to predict the frequencies of deleterious alleles as a function of mutational parameters. These frequencies are determined by the action of selection in nature (if the model is accurate), but the effects are estimated from variation measured in the greenhouse. This is noteworthy given that both the relationship between trait values and evolutionary fitness and the relationship between genotype and phenotype are likely to be different in the greenhouse than in the wild. Character heritabilities and genetic correlations may change when organisms are assayed in a novel environment (Service and Rose 1985). In addition, the magnitude of inbreeding depression may be substantially greater under natural conditions than in the relatively benevolent greenhouse environment (e.g., Dudash 1990; Carr and Eubanks 2002). This may lead to biased estimates of mutational parameters. This difficulty is abrogated, to some extent, by the inclusion of α in the model (Table 1). This parameter effectively translates between the measured phenotype and evolutionary fitness.
The important question for this study is whether the change in growth environment is responsible for the poor fit of models I and II to the data. Could the transfer cause rejection of the mutational models even if the natural population is in mutationselection equilibrium? It is difficult to provide a definitive answer to this question, but it seems unlikely given that the hypothesis tests do not impose stringent constraints on the values of mutational parameters. Instead, parameters such as h and s of model I are estimated from the data. Thus, we should not falsely reject the null model, even if mutational effects are different in the lab.
The allele frequencies at QTL are the key to the constraints. A false rejection of the mutational models might occur if alleles that are deleterious in nature become advantageous in the lab environment. Such alleles might increase in frequency and the resulting labadapted population would exhibit an excess of additive genetic variation relative to the field population. However, the opportunity for this kind of selection was greatly limited in this study. The inbred lines that constitute the parents of the breeding design were generated rapidly by selffertilization, with random selection of progeny within families (Willis 1999a,b). Each line was initiated from a single family (see study species and methods). A rare allele, present within a single family, might fix within the line founded by that family. However, it could not spread within the laboratory population as a whole because each family contributed to one and only one line. Adaptation to lab conditions might be a more serious concern for randomly mating populations that propagate themselves for a number of generations prior to estimation of genetic parameters (see Matoset al. 2000; Hoffmannet al. 2001; Linnenet al. 2001; Sgro and Partridge 2001).
An alternative approach would be to let the natural population adapt to the lab environment prior to biometric analyses. This has the advantage that population allele frequencies will settle to values dictated by the selective conditions of the lab environment (the relevant values of U, h, and s if variation in fitness is maintained in mutationselection balance). Of course, any parameter estimates would be specific to the lab environment. Moreover, a satisfactory fit of the data to a mutational model would not conclusively indicate that natural variation is maintained by mutationselection balance. Variation maintained by variable selection, in either space or time, or genotypebyenvironment interaction, would likely be lost in the process of lab adaptation.
A second potential concern with this study is that the deleterious mutation models assume random mating in the ancestral population. While the Iron Mountain population of M. guttatus is primarily outcrossing (Willis 1993b), some selffertilization does occur. Inbreeding should reduce the frequency of deleterious mutations relative to the randommating expectation, μ/(hs). This would reduce the absolute value of the variance components but not their relative values (to a first approximation when deleterious alleles are rare). Because the relative values of variance components determine the sufficiency of the models in the likelihoodratio tests, inbreeding is not the cause of failures for models I and II. In this vein, it is noteworthy that M. guttatus does exhibit large amounts of inbreeding depression in fitnessrelated traits despite its mating system (Willis 1993a,b; Dudash and Carr 1998). Moreover, most of this inbreeding depression was not purged during the formation of the lines used for this study (Willis 1999b).
Scale of measurement is another important concern for analyses based on patterns of variation. Scale transformations, such as the logarithm or square root of measurements, are routinely used in quantitative genetics to “stabilize” the variance (Wright 1952; Lynch and Walsh 1998, Chap. 11). In the present study, logtransformation substantially improved the fit of male fitness traits to each of the deleterious mutation models (Tables 4, 5, 6). This may mean that loci affecting these traits combine in a way that is more nearly multiplicative than additive. However, it could also mean that logtransformation simply obscures deviations from the models.
The most appropriate scale for the application of the present methods is the one in which loci contribute in the most nearly additive way to trait variation. On this scale, the mean phenotype should decline linearly with the inbreeding coefficient (Wright 1951; Kempthorne 1957). Several studies have attempted to determine the functional relationship between mean trait values and the level of inbreeding in M. guttatus (Willis 1993a; Carr and Dudash 1997). However, in each of these studies, the effects of changing homozygosity are confounded with the effects of genetic purging. As a consequence, it is difficult to determine the best scale of measurement for estimating mutational parameters. More data in this area are clearly necessary.
Concluding comments: More than anything, this study illustrates the need to combine model evaluation (hypothesis testing) with parameter estimation in biometric studies. Three distinct deleterious mutation models were tested. The first two models were rejected by the data for all characters, primarily because of an excess of additive genetic variation relative to other variance components. Thus, while most of the estimates in Tables 4 and 5 are reasonable, they cannot be treated as unbiased. However, model III was not rejected for either of the logtransformed male fitness components (Table 6). This suggests that a more elaborate deleterious mutation model may prove sufficient.
How generally will mutationselection balance prove sufficient as an explanation for variation in fitness components? This is clearly the issue that will confront future applications of the biometric method for estimating mutational parameters. The ubiquity of genetic variation is one of our most general observations of natural populations. The question of how evolutionary processes maintain this variation is one of the oldest in evolutionary biology. It remains one of the most important.
APPENDIX A
Cockerham and Weir (1984) describe the methods necessary to express the observational components (C_{SS}, C_{FS}, C_{OS}, and C_{HS}) in terms of causal components. For our case, where the parents are completely inbred, the appropriate equations are
Each of the causal components (V_{A}, V_{D}, V_{DI}, and C_{AD}) can be written as a function of allele frequencies and genetic effects across loci (Cockerham and Weir 1984; Kelly 1999). Using the parameterization of Table 1, I find that
APPENDIX B
We need to integrate over the probability density function of mutational effects, f(s). Consider β for a particular locus,
Acknowledgments
This article benefited from comments by Norm Slade, Scott Williamson, Miles Stearns, Liza Holeski, Tara Marriage, Bruce Walsh, and two anonymous reviewers. Tyler Ternes, Scott Jablonski, and H. S. Arathi helped to collect many of the measurements. This research was supported by funds from National Science Foundation grant DEB9903758 and the Murphy Scholarship fund.
Footnotes

Communicating editor: J. B. Walsh
 Received November 11, 2002.
 Accepted March 10, 2003.
 Copyright © 2003 by the Genetics Society of America