Detecting the Undetected: Estimating the Total Number of Loci Underlying a Quantitative Trait
 ^{*} Department of Zoology, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
 ^{†} Department of Biology, University of Rochester, Rochester, New York 14627
 Corresponding author: S. P. Otto, Department of Zoology, University of British Columbia, Vancouver, BC V6T 124. Email: otto{at}zoology.ubc.ca
Abstract
Recent studies have begun to reveal the genes underlying quantitative trait differences between closely related populations. Not all quantitative trait loci (QTL) are, however, equally likely to be detected. QTL studies involve a limited number of crosses, individuals, and genetic markers and, as a result, often have little power to detect genetic factors of small to moderate effects. In this article, we develop an estimator for the total number of fixed genetic differences between two parental lines. Like the CastleWright estimator, which is based on the observed segregation variance in classical crossbreeding experiments, our QTLbased estimator requires that a distribution be specified for the expected effect sizes of the underlying loci. We use this expected distribution and the observed mean and minimum effect size of the detected QTL in a likelihood model to estimate the total number of loci underlying the trait difference. We then test the QTLbased estimator and the CastleWright estimator in Monte Carlo simulations. When the assumptions of the simulations match those of the model, both estimators perform well on average. The 95% confidence limits of the CastleWright estimator, however, often excluded the true number of underlying loci, while the confidence limits for the QTLbased estimator typically included the true value ∼95% of the time. Furthermore, we found that the QTLbased estimator was less sensitive to dominance and to allelic effects of opposite sign than the CastleWright estimator. We therefore suggest that the QTLbased estimator be used to assess how many loci may have been missed in QTL studies.
GENETIC studies of quantitative trait loci (QTL) are beginning to reveal the genetic basis of phenotypic differences. In several cases, researchers have pinpointed the genetic changes that have occurred during the processes of natural selection, artificial selection, and speciation (for example, Doebley and Stec 1991; Patersonet al. 1991; Mackay 1996; Laurieet al. 1997; Bradshawet al. 1998). All QTL studies, however, involve a limited number of individuals and markers. Consequently, although QTL studies can detect loci that explain a large fraction of the phenotypic difference between two divergent lines, they have trouble identifying loci of more subtle effect. Thus, QTL studies are known to underestimate the total number of loci involved.
Estimating the total number of underlying loci is valuable for several reasons. First, given that the detected QTL generally represent only a fraction of the total set of QTL, it is worth obtaining accurate estimates of the number and average effect size of the undetected QTL before embarking upon further studies. Although it is tempting to consider simply the fraction of the total parental difference (or segregation variance) accounted for by the detected QTL, this procedure overestimates the importance of what has been detected because of an inherent bias in QTL analyses where the same data are used to detect QTL and to determine their effect sizes [a bias known as the Beavis (Beavis 1994, 1998) effect]. Second, most quantitative genetics theory assumes that many loci contribute to the observed phenotypic variation for a trait. Therefore, to determine the applicability of quantitative genetics theory, it is important to know whether there really are a large number of underlying loci, even if a QTL study detects only a small fraction of them. Third, minor effect QTL may have a much larger effect when different parental lines are involved or under different environmental conditions (Leips and Mackay 2000). Given that the magnitude of a QTL’s effect may depend on the experimental crosses and conditions, we may wish to have more general information about how many genes make some contribution to the observed phenotypic difference. For example, we predict that different studies of the same populations would be more likely to detect a different set of QTL when there are many undetected QTL than when there are few.
In this study, we investigate the relationship between the detected number of QTL (n_{d}) and the total number of genetic differences underlying a trait difference between parental lines (n). Based on the number and magnitude of the detected QTL, we develop a QTLbased estimator (n_{QTL}) for the total number of loci underlying an observed trait difference between parental lines.
To evaluate our estimator, we performed hundreds of simulated QTL experiments. For each “experiment,” we generated parental genomes carrying a specified number of QTL (n) with effects drawn at random from a gamma distribution. We then simulated crosses to generate an F_{1} and an F_{2} population. A QTL analysis was then conducted on the basis of the marker genotypes and phenotypes of these F_{2} individuals. We could thus compare our estimated number of underlying loci (n_{QTL}) to the true number (n). We also assessed the performance of the classical CastleWright estimator (n_{CW}), which is based on the segregation variance observed among hybrids (reviewed by Lynch and Walsh 1998, pp. 233243).
For both our QTLbased estimator and the CastleWright estimator, the expected distribution of allelic effects must be specified. We begin, therefore, with a discussion of this distribution and its shape. Second, we discuss the CastleWright and related estimators. Third, we develop our QTLbased estimator under the assumption of an exponential and a gamma distribution of allelic effect sizes. Finally, we present results from our simulated QTL experiments.
DISTRIBUTION OF EFFECT SIZES
Surprisingly, there is little theory about the suite of genetic differences that are likely to be observed among recently diverged taxa. While we have long known how to calculate the probability of fixation of a single mutation and how this depends on its effect on fitness, few studies have examined the entire set of substitutions likely to result from a period of adaptation.
Using an argument presented by Gillespie (1991, p. 266), we might expect an exponential distribution of selection coefficients among alleles that have fixed within a population. Gillespie noted that if all possible alleles at a locus were ordered in terms of absolute fitness from lowest to highest, then the selection coefficient of the second most fit allele relative to the most fit allele would follow an exponential distribution when measured across all loci. (His proof is based on extreme value theory.) Therefore, if the previously most fit allele at a locus falls only to second place in the fitness ordering when the environment (internal or external) changes, then the selection coefficients of substituted alleles should follow an exponential distribution. That is, we might expect an exponential distribution of selection coefficients among alleles that have fixed within a population if evolutionary change is accomplished primarily by the successive replacement of the two most fit alleles. The proof assumes, however, that genetic interactions are absent and that allelic fitnesses follow the same distribution at each locus, which is unlikely given functional differences among genes.
More recently, Orr (1998) studied the process of adaptation in a geometrical model first proposed by Fisher (1958). In this model, a fitness landscape in multiple dimensions has a single fitness optimum. Initially, the population is displaced from this optimum, perhaps because of a recent shift in the environment. Each new mutation that occurs affects a number of traits but does so to varying degrees. Whether the mutation is beneficial or deleterious depends on the current state of the population and on whether the mutation brings the population closer to the fitness optimum. Orr tracked the series of mutations that were incorporated into the population as it approached the adaptive optimum. In general, adaptation toward a peak in the fitness landscape produced a series of genetic changes whose effects on phenotype followed an approximately exponential distribution, a result that held under a broad variety of plausible distributions for the effects of mutations (Orr 1998, 1999).
Finally, an exponential distribution of fitness effects may also be a reasonable outcome of selection with a moving optimum. To show this, we assume that the fitness effects of new beneficial mutations follow a gamma distribution with mean, μ, and coefficient of variation, C. The shape of the gamma distribution depends on the coefficient of variation (Figure 1). When C = 1, the gamma is equivalent to an exponential distribution. For C > 1, the distribution becomes more Lshaped, while for C < 1, the distribution approaches a bell shape. We also assume that the distribution of newly arising beneficial mutations remains approximately constant throughout the adaptive process, which is plausible if a population lags steadily behind a moving optimum or fitness threshold. The distribution of fitness effects among those beneficial mutations that fix within a population can be calculated by weighting the original gamma distribution by each allele’s probability of fixation, which, in populations of large size, is approximately twice the allele’s selective advantage. It can be shown that the distribution of beneficial alleles, conditioned on their fixation, is also gamma, but with mean μ(1 + C^{2}) and coefficient of variation
The above arguments suggest that an exponential distribution might often describe the distribution of effect sizes among alleles that arise and fix during adaptive divergence. There may, however, be circumstances under which alternative distributions are plausible. For example, one might be examining a trait that has diverged as a pleiotropic response to selection on other traits or through neutral drift. In these cases, the distribution should more closely reflect the distribution of effect sizes among new mutations. On the other hand, if one of the parental lines has been subject to rapid and strong selection over a very short period of time, then only largeeffect mutations will have had enough time to fix. In this case the distribution of fixed effects may be more normal in shape with a mean offset from zero. Similarly, the fact that researchers choose traits and parental lines that are particularly divergent may bias the distribution of effect sizes such that a greater proportion of allelic effects are large. The gamma distribution is a natural choice to describe these various alternatives, because its shape is so flexible. Here, we focus on the exponential distribution, both because it has theoretical support and because the results are simpler to understand. We also provide a more general derivation that uses a gamma distribution to describe the underlying effects of alleles on the trait of interest and that may allow a more flexible approach to fitting real data.
THE CASTLEWRIGHT ESTIMATOR
Historically, one of the most widely used methods for estimating the number of loci underlying a trait difference between two lines was developed by Castle (1921) and Wright (1968). Their method is based on the amount of segregation variance observed in controlled crosses. Intuitively, if there is one or very few Mendelian factors underlying a trait, then some F_{2} individuals will have the same genotypes as the parents, and the phenotypic range observed among the F_{2}’s should span the entire range of the parental lines, P_{1} and P_{2}. As the number of loci contributing to the trait increases, however, F_{2} individuals will more often fall near the average of the parental lines as a result of the central limit theorem. Thus the amount of segregation variance, Var(S), estimated from hybrid (F_{2}, F_{3}, etc.) and backcross generations, contains information regarding the number of alleles underlying the phenotypic difference between two parental lines. If Δz is the difference between parental lines in the trait of interest, the CastleWright estimator for the number of underlying loci is
Numerous improvements and extensions have been made to the CastleWright estimator (summarized in Lynch and Walsh 1998, chapter 9). In particular, Zeng (1992) analyzed a generalized model that allowed for linkage among loci and variation in their effects. He estimated the number of underlying loci to be
Sampling variances for nˆ_{CW} and nˆ_{CWZ} have been determined by Lande (1981) and Zeng (1992). Rather than reiterate their general findings, we focus here on the sampling variance relevant to our simulation study. We assume that the parentalline means are known with negligible error and that the trait has been measured under controlled environmental conditions (i.e., the parental difference is attributable to genetic differences). In this case, the segregation variance reduces to the variance among F_{2} individuals, and the variance of the CastleWrightZeng estimator becomes
A QTLBASED ESTIMATOR
In light of the growing number of QTL studies, we develop an alternate method, based on data generated in a QTL mapping study, to estimate the total number of loci underlying a trait difference. In a typical QTL analysis, a handful of the loci that contribute to the trait difference are detected, each with an estimated additive and dominance effect and position. Our method takes advantage of the fact that the power to detect a QTL depends on the size of its effect. Given an expected distribution of effects, we can estimate the number of loci whose effects were too small to be detected (see Figure 1).
We first need to determine how the probability of detecting a QTL depends on its effect size. This power curve is approximately logistic in shape for a simple QTL study with one marker linked to one QTL (Figure 2). With N_{F2} individuals scored in the F_{2} generation, the probability of detecting a QTL rises at some point from near 0 to near 1. The more F_{2} individuals are examined, the more steeply the curve rises. With multiple markers, however, significance is usually assessed using a permutation test (Churchill and Doerge 1994). Without knowledge of the form of the power curve in a typical QTL study with multiple markers, we approximate the power curve with a threshold function, rising from 0 to 1 at a threshold θ. This approximation should be most accurate when many F_{2} individuals are scored. Other power functions can be explored, at least numerically, using the approach developed below, although simulations indicate that the estimator developed using this approximation is reasonably accurate as long as there are enough detected QTL (more than two) to provide a good estimate for the threshold.
We assume that the phenotypic difference between the parental lines, Δz, is primarily caused by fixed allelic differences at underlying QTL. We define the additive (a) and dominance (d) effects of each allele according to Zeng (1992),
 Population:
 P_{1} F_{1} P_{2}
 Genotype:
 A_{1} A_{1} A_{1} A_{2} A_{2} A_{2}
 Average phenotype:
 a d a,
where we refer to a as the effect size or additive effect of an allele. Here, we assume that each allele contributes to the parental difference in the same direction, that is, a is always positive or always negative. This assumption is reasonable if divergence is primarily a product of selection acting in different directions within the two populations or in a novel direction in one population. (In a later section, we explore the effect of relaxing this assumption through simulations.) We let D denote the sum of the additive effects across all QTL. Under the above assumptions, D also equals half the phenotypic difference between parental lines (Δz/2). We first derive an estimator for the number of loci underlying the trait difference assuming an exponential distribution of allelic effects and given an estimate of θ (the minimum threshold of detection). We then discuss how to obtain confidence limits for this estimator and how to estimate θ from QTL data. Finally, we repeat these steps assuming allelic effects follow the more general gamma distribution.
Exponentially distributed effect sizes: As argued above, we might expect alleles underlying phenotypic divergence to be exponentially distributed. To begin, we proceed under the assumption that the additive effect sizes (a) represent draws from an exponential distribution with mean, μ, and probability density function
Equation 6 can also be derived using a likelihood approach, which will enable us to calculate confidence limits for the number of loci underlying a trait difference. From the probability density function for observable QTL, (5), the likelihood of observing a set of QTL whose additive effect sizes are a_{i} (i = 1... n_{d}) is proportional to
Confidence limits: Confidence limits for nˆ_{QTL} can be obtained using the likelihoodratio test, which holds that the values of n whose log likelihoods lie within
Average effect of the undetected QTL: Because of the Beavis effect, the effects of the detected QTL are inflated and may appear to explain more of the trait difference between two parental lines than is the case. Here, we use the exponential distribution to estimate the relative importance of the undetected QTL. Specifically, we estimate the expected effect size of the undetected QTL, M_{undetected}, by averaging the exponential distribution within the range, 0 to θ. Recalling that the mean effect of all QTL (μ) is estimated by M  θ, we obtain the estimate
As an example, in the simulation study described below with 20 underlying QTL and 500 F_{2}’s (see Table 1), the detected QTL appeared to explain just over 100% of the parental difference, on average, even though less than half of the underlying QTL were detected. In fact, the undetected QTL accounted for ∼20% of the parental difference. Thus, the effects of undetected QTL can be substantial even when a QTL study indicates that the entire difference between parental lines has been explained.
One can also determine the expected fraction of segregation variance accounted for by the undetected QTL. From Lynch and Walsh (1998) Equation 9.26a, the segregation variance for unlinked loci equals ^{1}/_{2} times the sum of the squared average effects of all the QTL. The expected fraction of the segregation variance contributed by loci whose effects lie below the threshold of detection, θ, can be shown to equal
Estimating the threshold of detection: We have not yet addressed how to estimate the unknown threshold, θ. Ideally, one would compute the power curve for the probability of detecting a QTL. For example, one could perform Monte Carlo simulations, placing QTL of known effects on simulated genomes using the same number of markers and individuals as in the planned experiment. Alternatively, one can use the observed QTL data to estimate θ, as follows. If there are several detected QTL, then the magnitude of the smallest observed additive effect size (a_{min}) will often be near the threshold and can be used as an approximation for θ. Indeed, a_{min} is the maximumlikelihood estimator for θ because the shape of the exponential distribution is such that the most probable value for the smallest observed QTL is at the origin of the distribution (i.e., at θ). This is clearly a biased estimate for the threshold, because the smallest observed QTL will not lie below the minimum detectable size and will generally lie above θ. To obtain a less biased estimate for θ, we use a Bayesian approach. We assume that the threshold lies somewhere between 0 and a_{min} but that every point within this range is equally plausible (θ has a uniform prior distribution). We then weight the prior distribution for θ by the probability that the smallest detected QTL has additive effect size a_{min}. Noting that the minimum of n_{d} draws from an exponential distribution with mean μ is itself exponential with mean μ/n_{d} (Feller 1971, p. 18) and recalling that detectable loci follow an exponential distribution shifted to the right by θ, the probability density function for a_{min}  θ is exponential with mean μ/n_{d}. This gives us a posterior distribution for θ, whose average value is our estimated threshold
One potential problem with using a_{min} and M in our estimators is that they will be overestimated in QTL studies as a result of the Beavis effect. This occurs because the effect sizes of the QTL are estimated from the same data used to detect the QTL. Those QTL that, by chance, happen to have a larger effect than expected are more likely to be detected and so lead to inflated estimates of a_{i}. There is currently no analytical method to correct for the Beavis effect, so its impact was tested through simulation (described below). The simulations indicate that the Beavis effect did not cause a large bias in our QTLbased estimators. We suspect that nˆ_{QTL} is not extremely sensitive to the Beavis effect because both M and θ tend to be inflated, but the estimator depends primarily on their difference [see (6) and (9)], which may not be as strongly biased.
Gammadistributed effect sizes: We now describe more general results that apply when allelic effects follow a gamma distribution with mean, μ, and coefficient of variation, C. The probability density function of a gamma distribution is
Again, a_{min} can be used as an upwardly biased estimator for θ. Alternatively, one can correct θ using a Bayesian approach that takes into account the probability that a_{min} is the smallest detectable QTL when effect sizes follow a gamma distribution. This leads to an expression involving θ that must be numerically evaluated simultaneously with (14):
SIMULATIONS
To assess the accuracy of our estimator and the CastleWrightZeng estimator, QTL analyses were simulated using the QTLcartographer package (version 1.13a; Bastenet al. 1996). Using the Rmap program, we created a map with 20 chromosomes, each carrying five markers spaced 10 cM apart, with the first and last markers placed at the ends of the chromosome. This map was used for all simulations. On this map, we randomly placed 2, 5, 10, 20, or 100 QTL using Rqtl. For a given number of QTL, 30 different QTL maps were generated (i.e., we constructed 30 different “experimental genomes”). In the basic set of simulations, the following assumptions were made: (1) alleles were additive; (2) all allelic effects, a, had the same sign (i.e., each QTL acted in the direction of the observed parental difference); and (3) the QTL effects were drawn from an exponential distribution with a mean of 1/4. These assumptions were then relaxed in turn. To assess the effects of dominance, we used Rqtl to produce QTL with additive effects drawn from an exponential distribution with mean 1/4 and dominance effects drawn from a Beta distribution with shape parameter set to 1 (see Bastenet al. 1996). This routine produced a dominance term that ranged from a to a, with mean equal to 0. That is, alleles from P_{1} ranged from recessive to dominant. To assess the effects of having some QTL act in the opposite direction to the observed parental difference, we let the additive effect of a QTL be positive with probability 0.85 and negative with probability 0.15. Finally, to assess the effect of the underlying distribution of effect sizes, we drew effect sizes from a gamma distribution with a coefficient of variation of 0.5 or 2.0 instead of an exponential distribution (see Figure 1). For these extensions, the number of underlying QTL was set to 20, but otherwise the experimental design was the same as for the basic simulations. In all cases, environmental variation was assumed to be negligible.
For each experimental genome generated, Rcross was used to simulate the production of 200 and then 500 F_{2} individuals. This F_{2} sampling procedure was repeated 10 times. This design allowed us to assess whether the estimators were more sensitive to the realized distribution of QTL within the genome (variation among “experimental genomes”) or to the specific set of F_{2} individuals analyzed (variation among sampled individuals). For each set of F_{2} individuals, the QTL were mapped using Zmapqtl’s interval mapping routine (model 3). A single permutation test was used to determine an approximate significance threshold for each number of QTL and F_{2} sample size (5% genomewide significance level). In general, there was not much variation in significance thresholds (data not shown). This threshold was then used in conjunction with Eqtl to estimate the location and effects of the QTL (Bastenet al. 1996). Eqtl identifies putative QTL by scanning through the output of Zmapqtl and noting all intervals with a likelihood peak that exceeds the significance threshold.
Unfortunately, interval mapping methods systematically bias the effects of chromosome regions linked to QTL (Lynch and Walsh 1998, pp. 457458). Put simply, an interval not containing a QTL but adjacent to one containing a QTL may exceed the significance threshold because of its linkage to the QTL. This problem is endemic to all QTL analyses and no adequate solution has yet been developed (Whittakeret al. 1996; Goffinet and Mangin 1998). To reduce this problem, we screened our data set for obvious “ghost” QTL. That is, we eliminated intervals containing putative QTL whose position was within another QTL’s approximate support interval (the chromosome interval within which the likelihood of a QTL was within a factor of 10 of the peak likelihood for that QTL; see Lynch and Walsh 1998, pp. 448449). In each case, the QTL with the weaker support was eliminated. Visscher et al. (1996) have shown that a bootstrap approach is much more accurate, but it would have been too time consuming for our analysis.
We chose interval mapping for two reasons. First, interval mapping is an established method that has been repeatedly used in empirical studies and is well explored theoretically. Second, interval mapping is much faster, allowing us to perform more extensive tests. We also suggest that interval mapping provides a conservative test of nˆ_{QTL}. Interval mapping produces data that are less precise than more advanced methods of QTL analysis, such as composite interval mapping and multiple interval mapping (Zeng 1994; Kaoet al. 1999); this reduced precision should result in less accurate estimates for nˆ_{QTL}. Furthermore, interval mapping corresponds less well to our model, which assumes that QTL above the threshold of detection are always detected. Consequently, it seems reasonable that our estimator would perform even better with data from more sophisticated methods for detecting QTL.
Our simulation data were imported into Mathematica 3.0 (Wolfram 1996; package available at www.zoology.ubc.ca/~otto/Research) for analysis. In our calculations, we used the estimators appropriate for an exponential distribution of underlying QTL effect sizes. Specifically, we used Equation 6 for the QTLbased estimator and Equation 2 for the CastleWrightZeng estimator, with C = 1. For nˆ_{CWZ}, we initially used the average recombination rate between randomly chosen pairs of loci, c, which equals 0.48 for a genome with 20 chromosomes, each of length 40 cM (Lynch and Walsh 1998, Equation 9.3). Unfortunately, this often led to nonsensical answers, especially when the number of underlying loci was large (>5). The problem lies in the fact that the denominator in (2) is subject to substantial sampling error; relatively often, the denominator approached zero (causing severe overestimates) or became negative. With 20 underlying QTL, the nˆ_{CWZ} estimate for the number of underlying loci averaged 49.3 (with 500 F_{2}’s) and 21.8 (with 200 F_{2}’s) over the 300 simulations! To avoid these problems, we set c to ^{1}/_{2} in all of our analyses, which either made little difference (for small n) or improved the estimates.
Because the QTL estimator relies on the difference between the mean effect of factors found and the minimum effect found, it can only work if at least two QTL have been identified. Therefore, we excluded QTL analyses with only one detected QTL. In experimental genomes with more than two true QTL, very few analyses were excluded. With only two true QTL, however, approximately half of the analyses had to be excluded. Furthermore, in a very small number of cases with two true QTL (6/600), two QTL were detected whose estimated effects were, by chance, equal. To avoid divisionbyzero errors, the mean was increased by 5% over the minimum in these cases. (If, instead, these cases were eliminated, the performance of the estimators improved slightly over the results shown in Table 1.)
RESULTS
The results of the simulations are presented in Table 1. As expected, our QTLbased estimator was more accurate when (11) was used to estimate the threshold of detection (θ) than when the smallest detected QTL was used (a_{min}). We therefore focus our discussion on estimates based on (11). Both our estimator and the CastleWrightZeng estimator were fairly accurate on average when there was an intermediate number of underlying QTL (5 ≤ n ≤ 20). With 100 QTL, however, both methods underestimated the true number of QTL but for different reasons. The QTLbased estimator will be biased downward whenever the density of QTL is high, because tightly linked QTL are then rarely separated by recombination. Furthermore, in interval mapping, the number of detected QTL must be less than the number of marker intervals, which was only 80 in our study. This bias could be eliminated by following the lines for more generations (increasing the opportunity for recombination) and by adding more markers to the study (increasing the number of marker intervals). The CastleWrightZeng estimator, on the other hand, becomes less and less accurate as the number of loci increases because the segregation variance approaches zero and becomes harder to estimate precisely. Although increasing the number of F_{2} progeny tested and following the lines for more generations may improve the accuracy of the CastleWrightZeng estimator, our simulations indicate no substantial improvement between N_{F2} = 200 and N_{F2} = 500.
With only two true QTL, our estimator performed poorly, but the CastleWrightZeng estimator continued to perform well, on average. Because we often had to exclude cases where only one QTL was detected when there were only two true QTL, it is not surprising that the estimators overestimated the number of underlying QTL. Note that if we also excluded cases where only two QTL were detected, the average of our estimator improved (Table 1, last two rows), which suggests that nˆ_{QTL} is biased upward by sampling error when there are few detected QTL. More generally, Table 1 suggests that, unless the number of QTL is very large, the QTLbased estimator tends to overestimate the true number of underlying loci and that this bias is stronger with fewer QTL. We expect an upward bias in our estimator for two reasons: (1) when there are few detected QTL, there will be substantial variation in the denominator (M  θ) of Equation 6, which can approach zero and generate an overestimate, and (2) the Beavis effect will generate a greater upward bias in θ than in M, which also leads (6) to overestimate the number of underlying loci. We therefore recommend the use of the QTLbased estimator only when three or more QTL have been detected and when the mean detected QTL is substantially above (say >25% above) the minimum threshold of detection.
Interestingly, the largest difference between the two estimators is not in their average performance but in their confidence limits. Appropriate 95% confidence limits should include the true value 95% of the time. The confidence limits based on (9) for our estimator have this property (see numbers in square brackets in Table 1). In those cases where our estimator had little bias (5 < n < 20), the confidence limits included the true value 95.1% of the time. On the other hand, the confidence limits for the CastleWrightZeng estimator included the true value only 47.6% of the time. The confidence limits for nˆ_{CWZ} often excluded the true number of underlying loci because these limits only account for error caused by sampling a limited number of F_{2} individuals. They do not account for the sampling error inherent in having a limited number of QTL, whose effects represent particular draws from an underlying distribution. For example, if the QTL with the largest effect has, by chance, a magnitude that is greater than expected, there will be more segregation variance than expected, and nˆ_{CWZ} will underestimate n. Conversely, if the major QTL have, by chance, roughly equal influence, there will be less segregation variance than expected, and nˆ_{CWZ} will overestimate n.
In fact, the sampling of different sets of QTL in different experimental genomes accounts for a large fraction of the total variance in estimates of n (Figure 4). This is especially true for nˆ_{CWZ}, where almost all of the observed variation was among genomes with different sets of QTL (dark gray bars) rather than among the different sets of F_{2} individuals sampled from each experimental genome (light gray bars). In other words, nˆ_{CWZ} depended little on the exact set of F_{2} individuals, but it varied greatly each time a new set of QTL was generated. Figure 4 suggests that, with at least 200 F_{2} individuals, the confidence limits for the CastleWrightZeng estimator are based on a minor source of error (F_{2} sampling variance) rather than the bulk of the error (QTL sampling variance). In practice, this means that if a researcher is interested in the number of underlying loci that are responsible for a trait difference between two parental lines, the CastleWrightZeng estimator will too often indicate a high degree of confidence in the wrong number and exclude the right number. Furthermore, as indicated by the simulations, reestimating nˆ_{CWZ} using a different set of F_{2} individuals is unlikely to help because the sampling of F_{2}’s was not the major source of error (Figure 4).
Dominance: In the above results, the simulated QTL had additive effects. To test the impact of dominance on our estimator, we simulated experimental genomes that included 20 QTL with nonadditive effects ranging from fully recessive to fully dominant. Although models have been developed to incorporate dominance into the CastleWright estimator, these generally assume that the mean and distribution of dominance coefficients are known. Because most QTL studies lack such information, we continue to use (2) and (6) to estimate the number of underlying loci. That is, we ask, how inaccurate are the two estimators if we assume no dominance when dominance is actually present? The inclusion of dominance did not noticeably affect the performance of the QTLbased estimator, but it caused nˆ_{CWZ} to underestimate n by ∼20% (Table 2). Dominance tends to inflate the segregation variance inferred from F_{2} individuals, because heterozygotes at a locus have genotypic values further from the mean. This effect is even more exaggerated with overdominance or underdominance, in which case the phenotype of the F_{2}’s may lie outside of the range defined by the two parental lines (transgressive segregation). This explains the sensitivity of nˆ_{CWZ} to the inclusion of dominance but does not address why nˆ_{QTL} was little affected by dominance. We believe that, because the methods used to detect QTL explicitly allow dominance levels to vary among loci, the estimated additive effects of the detected QTL (and hence nˆ_{CWZ}) are not strongly biased by dominance. Interestingly, the average power of the QTL experiments was also little affected by dominance (compare τ values in Tables 1 and 2). Because nˆ_{QTL} is less sensitive to dominance interactions, it is a more appropriate estimator to use than nˆ_{CWZ} whenever the nature of dominance is unknown.
QTL of opposite effect: Table 3 presents results for experiments where the additive effect of a QTL had an 85% chance of being positive and a 15% chance of being negative. Although both estimators underestimated the true number of underlying QTL, nˆ_{QTL} was much less sensitive than nˆ_{CWZ} to the inclusion of loci affecting the trait of interest in the opposite direction. On average, ∼16 underlying QTL were estimated with nˆ_{QTL}, while ∼9 were estimated with nˆ_{CWZ}, whereas the true number was 20. When the effects of QTL oppose one another, the trait values of the F_{2}’s are no longer expected to lie strictly between the parental lines, providing another explanation for transgressive segregation. As with dominance, the segregation variance measured among the F_{2}’s is thus inflated, and the CastleWrightZeng estimator underestimates the number of underlying loci. On the other hand, the inclusion of QTL with opposite effects reduced the power of the QTL experiments (compare τ values in Tables 1 and 3), but the additive effects of the detected QTL (and hence nˆ_{QTL}) were not strongly biased.
Gamma distribution of effect sizes: Finally, we tested the extent to which the estimators were sensitive to the underlying distribution of effect sizes. Table 4 provides results from simulations where the underlying distribution was gamma (with C = 0.5 or 2.0) but where the analyses assumed an exponential distribution (C = 1.0). As expected, nˆ_{QTL} and nˆ_{CWZ} overestimated the true number of QTL when the underlying distribution had a lower coefficient of variation (C = 0.5). The extent of the bias was not severe for nˆ_{CWZ} in this case; this is consistent with the form of Equation 2, which changes less when C is lowered by a fraction than when it is increased. Conversely, both nˆ_{QTL} and nˆ_{CWZ} underestimated the true number of QTL by ∼50% when the underlying distribution had a higher coefficient of variation (C = 2.0), with nˆ_{CWZ} performing slightly worse than nˆ_{QTL}. One can correct these estimators using Equation 2 for nˆ_{CWZ}, and 14 and 15 for nˆ_{QTL}. These corrections brought the estimates toward the true value of 20 but tended to overcorrect, especially for nˆ_{QTL} when C = 2.0, perhaps because the observed coefficient of variation of the true effects of the QTL was highly variable in this case. Of course, making these corrections requires external knowledge about the underlying distribution, which will often be lacking. When many QTL have been detected, the number of underlying loci, the threshold of detection, and the shape of the distribution could be simultaneously estimated using a maximumlikelihood approach. One potential problem is that the Beavis effect will bias the shape parameter (lowering C) by making QTL of small effect seem larger.
CONCLUSIONS
Historically, the number of genetic factors, n, underlying an observed difference between two parental lines has been estimated using methods developed by Castle (1921) and Wright (1968). QTL analyses have, to some extent, supplanted the CastleWright estimator. QTL analyses localize the genetic factors and estimate their additive and dominance effects without assuming that these effects follow a particular distribution. Unfortunately, QTL analyses are best at finding factors with profound phenotypic effects and often miss factors of moderate to small effect. As a result, the number of observed QTL is a poor indicator of the number of loci contributing to a difference between two parental lines.
This problem is illustrated in Figure 5, which shows the expected number of detectable QTL as a function of the number of underlying QTL. Figure 5 is based on the assumption that there is a threshold below which a QTL is unlikely to be detected and above which it is. Two threshold levels of detection are illustrated, with θ set to 5 or 10% of D, where D is the total additive effect size (i.e., half the parental difference). The first threshold was typical in our simulation studies with 500 F_{2}’s, while the second was typical in simulations with 200 F_{2}’s. The most striking feature of these curves is that they do not increase monotonically with the number of underlying loci. Instead, the expected number of detected loci initially rises, reaches a maximum, and then falls back toward zero as the number of underlying loci increases. These curves suggest two reasons why a certain number of QTL may be observed: (1) there are few underlying QTL, but their average effect is relatively large such that most are above the threshold, or (2) there are several QTL, but their average effect is relatively small such that few are above the threshold of detection. Furthermore, the maxima of these curves is fairly low, indicating that only a handful of QTL will be detected regardless of the true number of loci contributing to the trait difference. These conclusions are the same whether the effects of the underlying loci follow an exponential distribution, an Lshaped gamma distribution, or a bellshaped gamma distribution. In short, because QTL studies predominantly detect loci of large effect, the number of loci detected in a QTL study is not linearly related to the number of underlying loci.
Here, we present a new estimator of gene number, nˆ_{QTL}, that takes into account the bias of QTL analyses toward detecting loci of large effect. By noting the average size and the minimum size of the detected QTL, we can estimate the number and magnitude of the loci whose effects were too small to be detected. As with the CastleWright estimator, this technique requires us to specify the expected distribution of effect sizes. We develop a QTLbased estimator for an exponential distribution and a gamma distribution of effect sizes. Although our method assumes that QTL analyses have a negligible probability of detecting a QTL below a threshold (θ) and a 100% probability of detecting QTL above θ, simulations indicate that this simplifying assumption does not generate a substantial bias in the average number of estimated loci (nˆ_{QTL}).
As Table 1 shows, our QTLbased estimator provides a good approximation for the number of underlying loci unless few QTL were detected (n_{d} < 3) or the genetic map was saturated with QTL (more QTL than marker intervals; n_{d} > 80). Furthermore, in those cases where the average value of the estimator approximately equals the true number of underlying loci (i.e., when 5 ≤ n ≤ 20), the 95% confidence limits based on nˆ_{QTL} contain n ∼ 95% of the time (Table 1). In contrast, the 95% confidence limits for the CastleWrightZeng estimator often miss the true value for n, despite the fact that the simulations and the estimator both assume an exponential distribution of effect sizes. Essentially, the confidence limits for the CastleWrightZeng estimator do not account for the variance inherent in the sampling of mutations that arise and fix within a population. Because the confidence limits for nˆ_{QTL} take into account the sampling error inherent in drawing allelic effects from an underlying distribution, they more often include the true value for the number of underlying loci. An additional benefit of our estimator is that it is less sensitive to dominance (compare Tables 1 and 2) and to violations of the assumption that the additive effects of all alleles have the same sign (compare Tables 1 and 3).
To demonstrate the application of our estimator to real data, we consider two examples. The first, a QTL study by Bradshaw et al. (1998), examined floral morphology in monkeyflowers (Table 5). We focus on two of their morphological characters, stamen and pistil length. In both cases, 5 QTL were detected, each acting in the same direction. Assuming an exponential distribution, our QTLbased estimator predicted that 12.0 loci underlie the stamen length differences and that 8.2 loci underlie pistil length differences, suggesting that about half of the QTL have been detected. Using 1/2 as the average rate of recombination among pairs of loci, the CastleWrightZeng estimator inferred 4.1 factors affecting stamen length and 3.9 factors affecting pistil length, both of which fall short of the observed number of QTL. Using the average recombination rate based on the genome map of monkeyflowers (0.413) increased these estimates only slightly (to 6.5 and 6.1, respectively). The CastleWrightZeng estimators are particularly suspect for these traits, however, because both exhibit transgressive variation, where the F_{2} distribution is broader than the parental difference. The presence of transgressive variation suggests that there are either strong interactions among alleles (e.g., overdominance) or QTL of opposite effects. In either case, the CastleWrightZeng estimator underestimates the number of underlying loci and is less reliable than the QTLbased estimator (see Tables 2 and 3).
The above example highlights the difference between nˆ_{QTL} and nˆ_{CWZ}. Our next example demonstrates that our estimator could be used to predict the number of loci that may be uncovered in a more powerful QTL study. Liu et al. (1996) mapped factors affecting genital arch shape in Drosophila mauritiana and D. simulans hybrids, employing two backcrosses and <200 individuals. Zeng et al. (2000) extended this analysis by increasing the sample size (∼500 individuals per backcross) and reanalyzing the old and new data sets using multiple interval mapping. We concentrate on the results presented by Zeng et al., as they were obtained by applying the same mapping methodology and criteria to both data sets.
Liu et al.’s backcross analysis suggested that 1113 QTL are involved in the genitalia difference. [The first backcross to D. mauritiana (BM) identified 11 QTL; the backcross to D. simulans (BS) identified 13 QTL.] Based on Equation 11, our estimator suggests that the true number is closer to 21.4 for BM and 23.2 for BS. The number of QTL found in the BS analysis is within our 95% confidence intervals (CIs), albeit just barely (95% CI, 12.738.1). The number of QTL found in the BM analysis, however, is not (95% CI, 11.136.7). This predicts that a fair number of QTL were probably missed in these analyses, a fact that was confirmed by Zeng et al. (2000). After more than doubling the number of individuals and markers genotyped, Zeng et al. found 19 QTL involved in genitalia differences in both backcrosses. With this data, nˆ_{QTL} = 16.7 (95% CI, 10.225.3) for the BS lines and 20.0 (95% CI, 12.330.5) for the BM lines. The value of 19 QTL found by Zeng et al. is well within our 95% confidence intervals and is near the mean of the two estimates. This, plus the fact that the power of the second analysis is quite high (τ = 0.97 for the BS analysis and τ = 0.92 for the BM analysis), indicates that nearly all of the QTL have now been identified. Thus, our estimator accurately estimated the number of QTL that could be found in a powerful experiment given data from a less powerful experiment.
Improving estimates of gene number could have an important impact in both quantitative and evolutionary genetics. First, better estimates would help researchers know how many genes affecting a trait of interest go undetected. They could then make more informed decisions about whether to refine a QTL analysis to uncover missing factors. Second, better estimates of the actual number of genes underlying divergent traits can help us evaluate the applicability of quantitative genetic models, which assume a large number of underlying loci. Third, improved estimates of gene number provide interesting information about the genetic architecture underlying evolutionary change. For example, they can help us identify the sorts of phenotypic changes that are typically accomplished by few allelic substitutions. Although we have taken steps to incorporate data being generated in QTL studies, further work is warranted. In particular, it would be valuable to know how best to use the information contained in both the CastleWrightZeng and the QTLbased estimators to obtain an even more powerful estimator of the number of loci underlying phenotypic divergence.
Acknowledgments
We owe special thanks to H. Allen Orr for encouraging this project along, and for generously sharing his time and ideas, and to John Huelsenbeck for kindly providing access to his computer facilities. We also thank Andrea Betancourt, Thomas Lenormand, J. P. Masly, Allen Orr, Art Poon, Daven Presgraves, Dolph Schluter, Peter Visscher, Bruce Walsh, Michael Whitlock, ZhaoBang Zeng, and an anonymous reviewer for their helpful comments on the project and manuscript. This work was inspired by a discussion of the Vancouver Evolutionary Group and was sponsored by grants from the Natural Sciences and Engineering Research Council of Canada (S.P.O.), the Peter Wall Institute for Advanced Studies (S.P.O.), the David and Lucile Packard Foundation (H. A. Orr), the National Institutes of Health (GM51932 to H. A. Orr), and a Caspari Fellowship from the University of Rochester (C.D.J.).
Footnotes

Communicating editor: J. B. Walsh
 Received December 17, 1999.
 Accepted July 28, 2000.
 Copyright © 2000 by the Genetics Society of America