- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Otto, S. P.
- Articles by Jones, C. D.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Otto, S. P.
- Articles by Jones, C. D.
Detecting the Undetected: Estimating the Total Number of Loci Underlying a Quantitative Trait
Sarah P. Ottoa and Corbin D. Jonesba Department of Zoology, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
b Department of Biology, University of Rochester, Rochester, New York 14627
Corresponding author: Sarah P. Otto, Department of Zoology, University of British Columbia, Vancouver, BC V6T 124., otto{at}zoology.ubc.ca (E-mail)
Communicating editor: J. B. WALSH
| ABSTRACT |
|---|
Recent studies have begun to reveal the genes underlying quantitative trait differences between closely related populations. Not all quantitative trait loci (QTL) are, however, equally likely to be detected. QTL studies involve a limited number of crosses, individuals, and genetic markers and, as a result, often have little power to detect genetic factors of small to moderate effects. In this article, we develop an estimator for the total number of fixed genetic differences between two parental lines. Like the Castle-Wright estimator, which is based on the observed segregation variance in classical crossbreeding experiments, our QTL-based estimator requires that a distribution be specified for the expected effect sizes of the underlying loci. We use this expected distribution and the observed mean and minimum effect size of the detected QTL in a likelihood model to estimate the total number of loci underlying the trait difference. We then test the QTL-based estimator and the Castle-Wright estimator in Monte Carlo simulations. When the assumptions of the simulations match those of the model, both estimators perform well on average. The 95% confidence limits of the Castle-Wright estimator, however, often excluded the true number of underlying loci, while the confidence limits for the QTL-based estimator typically included the true value
95% of the time. Furthermore, we found that the QTL-based estimator was less sensitive to dominance and to allelic effects of opposite sign than the Castle-Wright estimator. We therefore suggest that the QTL-based estimator be used to assess how many loci may have been missed in QTL studies.
GENETIC studies of quantitative trait loci (QTL) are beginning to reveal the genetic basis of phenotypic differences. In several cases, researchers have pinpointed the genetic changes that have occurred during the processes of natural selection, artificial selection, and speciation (for example, ![]()
![]()
![]()
![]()
![]()
Estimating the total number of underlying loci is valuable for several reasons. First, given that the detected QTL generally represent only a fraction of the total set of QTL, it is worth obtaining accurate estimates of the number and average effect size of the undetected QTL before embarking upon further studies. Although it is tempting to consider simply the fraction of the total parental difference (or segregation variance) accounted for by the detected QTL, this procedure overestimates the importance of what has been detected because of an inherent bias in QTL analyses where the same data are used to detect QTL and to determine their effect sizes [a bias known as the Beavis (![]()
![]()
![]()
In this study, we investigate the relationship between the detected number of QTL (nd) and the total number of genetic differences underlying a trait difference between parental lines (n). Based on the number and magnitude of the detected QTL, we develop a QTL-based estimator (nQTL) for the total number of loci underlying an observed trait difference between parental lines.
To evaluate our estimator, we performed hundreds of simulated QTL experiments. For each "experiment," we generated parental genomes carrying a specified number of QTL (n) with effects drawn at random from a gamma distribution. We then simulated crosses to generate an F1 and an F2 population. A QTL analysis was then conducted on the basis of the marker genotypes and phenotypes of these F2 individuals. We could thus compare our estimated number of underlying loci (nQTL) to the true number (n). We also assessed the performance of the classical Castle-Wright estimator (nCW), which is based on the segregation variance observed among hybrids (reviewed by ![]()
For both our QTL-based estimator and the Castle-Wright estimator, the expected distribution of allelic effects must be specified. We begin, therefore, with a discussion of this distribution and its shape. Second, we discuss the Castle-Wright and related estimators. Third, we develop our QTL-based estimator under the assumption of an exponential and a gamma distribution of allelic effect sizes. Finally, we present results from our simulated QTL experiments.
| DISTRIBUTION OF EFFECT SIZES |
|---|
Surprisingly, there is little theory about the suite of genetic differences that are likely to be observed among recently diverged taxa. While we have long known how to calculate the probability of fixation of a single mutation and how this depends on its effect on fitness, few studies have examined the entire set of substitutions likely to result from a period of adaptation.
Using an argument presented by ![]()
More recently, ![]()
![]()
![]()
![]()
Finally, an exponential distribution of fitness effects may also be a reasonable outcome of selection with a moving optimum. To show this, we assume that the fitness effects of new beneficial mutations follow a gamma distribution with mean, µ, and coefficient of variation, C. The shape of the gamma distribution depends on the coefficient of variation (Fig 1). When C = 1, the gamma is equivalent to an exponential distribution. For C > 1, the distribution becomes more L-shaped, while for C < 1, the distribution approaches a bell shape. We also assume that the distribution of newly arising beneficial mutations remains approximately constant throughout the adaptive process, which is plausible if a population lags steadily behind a moving optimum or fitness threshold. The distribution of fitness effects among those beneficial mutations that fix within a population can be calculated by weighting the original gamma distribution by each allele's probability of fixation, which, in populations of large size, is approximately twice the allele's selective advantage. It can be shown that the distribution of beneficial alleles, conditioned on their fixation, is also gamma, but with mean µ(1 + C2) and coefficient of variation
.Interestingly, this conditional distribution always has a coefficient of variation <1. Hence, it is either exponential or bell shaped, even if newly arising beneficial mutations have a very L-shaped distribution. Unfortunately, we know little about the distribution of fitness effects of mutations in general and of beneficial mutations in particular. The distribution of spontaneous deleterious mutations is thought to be roughly L-shaped, with estimates of C ranging from 2 to 5 (![]()
![]()
|
The above arguments suggest that an exponential distribution might often describe the distribution of effect sizes among alleles that arise and fix during adaptive divergence. There may, however, be circumstances under which alternative distributions are plausible. For example, one might be examining a trait that has diverged as a pleiotropic response to selection on other traits or through neutral drift. In these cases, the distribution should more closely reflect the distribution of effect sizes among new mutations. On the other hand, if one of the parental lines has been subject to rapid and strong selection over a very short period of time, then only large-effect mutations will have had enough time to fix. In this case the distribution of fixed effects may be more normal in shape with a mean offset from zero. Similarly, the fact that researchers choose traits and parental lines that are particularly divergent may bias the distribution of effect sizes such that a greater proportion of allelic effects are large. The gamma distribution is a natural choice to describe these various alternatives, because its shape is so flexible. Here, we focus on the exponential distribution, both because it has theoretical support and because the results are simpler to understand. We also provide a more general derivation that uses a gamma distribution to describe the underlying effects of alleles on the trait of interest and that may allow a more flexible approach to fitting real data.
| THE CASTLE-WRIGHT ESTIMATOR |
|---|
Historically, one of the most widely used methods for estimating the number of loci underlying a trait difference between two lines was developed by ![]()
![]()

is the difference between parental lines in the trait of interest, the Castle-Wright estimator for the number of underlying loci is
![]() |
(1) |
Equation 1 assumes that the underlying loci are unlinked and that they have equal effects. Consequently, (1) is said to estimate the "effective" number of underlying loci, equivalent to the number that there would be if all QTL were unlinked and had effects of equal magnitude.
Numerous improvements and extensions have been made to the Castle-Wright estimator (summarized in ![]()
![]()
![]() |
(2) |
where
is the average recombination rate between loci, and C is the coefficient of variation for the distribution describing the additive effects. Given that allelic effects are likely to vary from locus to locus, Equation 2 should more closely estimate the true number of underlying loci than does the effective number presented in (1). Equation 2 requires, however, that the distribution of effects be specified. Fortunately, it can be used with a variety of distributions describing the additive effects of the QTL, including an exponential and a gamma distribution.
Sampling variances for
CW and
CWZ have been determined by ![]()
![]()
![]() |
(3) |
where NF2 is the number of F2 individuals measured. Equation 3 can be used to construct confidence limits for
CWZ under the assumption that the error in the estimator is normally distributed.
| A QTL-BASED ESTIMATOR |
|---|
In light of the growing number of QTL studies, we develop an alternate method, based on data generated in a QTL mapping study, to estimate the total number of loci underlying a trait difference. In a typical QTL analysis, a handful of the loci that contribute to the trait difference are detected, each with an estimated additive and dominance effect and position. Our method takes advantage of the fact that the power to detect a QTL depends on the size of its effect. Given an expected distribution of effects, we can estimate the number of loci whose effects were too small to be detected (see Fig 1).
We first need to determine how the probability of detecting a QTL depends on its effect size. This power curve is approximately logistic in shape for a simple QTL study with one marker linked to one QTL (Fig 2). With NF2 individuals scored in the F2 generation, the probability of detecting a QTL rises at some point from near 0 to near 1. The more F2 individuals are examined, the more steeply the curve rises. With multiple markers, however, significance is usually assessed using a permutation test (![]()
. This approximation should be most accurate when many F2 individuals are scored. Other power functions can be explored, at least numerically, using the approach developed below, although simulations indicate that the estimator developed using this approximation is reasonably accurate as long as there are enough detected QTL (more than two) to provide a good estimate for the threshold.
|
We assume that the phenotypic difference between the parental lines, 
, is primarily caused by fixed allelic differences at underlying QTL. We define the additive (a) and dominance (d) effects of each allele according to ![]()

where we refer to a as the effect size or additive effect of an allele. Here, we assume that each allele contributes to the parental difference in the same direction, that is, a is always positive or always negative. This assumption is reasonable if divergence is primarily a product of selection acting in different directions within the two populations or in a novel direction in one population. (In a later section, we explore the effect of relaxing this assumption through simulations.) We let D denote the sum of the additive effects across all QTL. Under the above assumptions, D also equals half the phenotypic difference between parental lines (
). We first derive an estimator for the number of loci underlying the trait difference assuming an exponential distribution of allelic effects and given an estimate of
(the minimum threshold of detection). We then discuss how to obtain confidence limits for this estimator and how to estimate
from QTL data. Finally, we repeat these steps assuming allelic effects follow the more general gamma distribution.
Exponentially distributed effect sizes:
As argued above, we might expect alleles underlying phenotypic divergence to be exponentially distributed. To begin, we proceed under the assumption that the additive effect sizes (a) represent draws from an exponential distribution with mean, µ, and probability density function
![]() |
(4) |
If there are a total of n underlying QTL, then the mean effect size will be µ =
. Using (4), we can determine the probability density function for detectable QTL, i.e., those that lie above the threshold
,
![]() |
(5) |
which is simply the same exponential distribution with an origin shifted to the right by an amount,
. The mean additive effect size among detected QTL is therefore expected to equal µ +
=
+
. If the number of QTL actually detected is nd and their average effect size is M, we can estimate the total number of loci by setting M to its expectation, D/n +
. Rearranging this equation, the estimated number of underlying loci is
![]() |
(6) |
Equation 6 can also be derived using a likelihood approach, which will enable us to calculate confidence limits for the number of loci underlying a trait difference. From the probability density function for observable QTL, (5), the likelihood of observing a set of QTL whose additive effect sizes are ai (i = 1 ... nd) is proportional to
![]() |
(7) |
Substituting in µ =
and solving for the maximum, the most likely value for the true number of underlying loci is again given by (6).
Confidence limits:
Confidence limits for
QTL can be obtained using the likelihood-ratio test, which holds that the values of n whose log likelihoods lie within
of the maximum log likelihood comprise an approximate 1 -
confidence region. This amounts to solving the equation
![]() |
(8) |
for n. From (7), the confidence limits for n equal the two roots (found by numerical solution or plotting) to the equation
![]() |
(9) |
where
21[0.05] = 3.841 is used to obtain 95% confidence limits. More exact confidence limits can be found using the methods described for exponential distributions in ![]()
nd), and the confidence limits can be adjusted accordingly.
Average effect of the undetected QTL:
Because of the Beavis effect, the effects of the detected QTL are inflated and may appear to explain more of the trait difference between two parental lines than is the case. Here, we use the exponential distribution to estimate the relative importance of the undetected QTL. Specifically, we estimate the expected effect size of the undetected QTL, Mundetected, by averaging the exponential distribution within the range, 0 to
. Recalling that the mean effect of all QTL (µ) is estimated by M -
, we obtain the estimate
![]() |
(10) |
where
=
.
is a measure of the power of a QTL experiment. It ranges from 0 for very weak experiments with a threshold near the mean detected effect to 1 for a very powerful experiment with a threshold near 0. Equation 10 indicates that the average effect among the undetected QTL reaches a maximum of 23% M when
= 0.36. For experiments with little power (
near 0), the relative size of the undetected QTL decreases toward zero because any QTL that are detected are likely to have effects well above the average. Similarly, for powerful experiments (
near 1), most QTL will have been detected, and the remaining ones will have a relatively small effect. For the simulation experiments described below, the average power ranged from 0.27 to 0.61 (see Table 1), where the experiments with 500 F2's were more powerful. Over this range, the expected effect of the undetected QTL is 1723% of the average effect of the detected QTL (M).
|
As an example, in the simulation study described below with 20 underlying QTL and 500 F2's (see Table 1), the detected QTL appeared to explain just over 100% of the parental difference, on average, even though less than half of the underlying QTL were detected. In fact, the undetected QTL accounted for
20% of the parental difference. Thus, the effects of undetected QTL can be substantial even when a QTL study indicates that the entire difference between parental lines has been explained.
One can also determine the expected fraction of segregation variance accounted for by the undetected QTL. From ![]()
, can be shown to equal

For example, in the simulation experiments, where the average power (
) ranged from 0.27 to 0.61 (see Table 1), the fraction of the segregation variance that we expect to be accounted for by undetected QTL ranges from 51 to 3%, decreasing rapidly as the power of the experiment increases. Although one could simply calculate the total segregation variance contributed by the observed QTL, this procedure would overestimate the importance of the detected QTL because of the Beavis effect, which causes an especially large bias in variance calculations where the effect sizes of detected QTL are squared.
Estimating the threshold of detection:
We have not yet addressed how to estimate the unknown threshold,
. Ideally, one would compute the power curve for the probability of detecting a QTL. For example, one could perform Monte Carlo simulations, placing QTL of known effects on simulated genomes using the same number of markers and individuals as in the planned experiment. Alternatively, one can use the observed QTL data to estimate
, as follows. If there are several detected QTL, then the magnitude of the smallest observed additive effect size (amin) will often be near the threshold and can be used as an approximation for
. Indeed, amin is the maximum-likelihood estimator for
because the shape of the exponential distribution is such that the most probable value for the smallest observed QTL is at the origin of the distribution (i.e., at
). This is clearly a biased estimate for the threshold, because the smallest observed QTL will not lie below the minimum detectable size and will generally lie above
. To obtain a less biased estimate for
, we use a Bayesian approach. We assume that the threshold lies somewhere between 0 and amin but that every point within this range is equally plausible (
has a uniform prior distribution). We then weight the prior distribution for
by the probability that the smallest detected QTL has additive effect size amin. Noting that the minimum of nd draws from an exponential distribution with mean µ is itself exponential with mean µ/nd (![]()
, the probability density function for amin -
is exponential with mean µ/nd. This gives us a posterior distribution for
, whose average value is our estimated threshold (
), which can be found to solve
![]() |
(11) |
If several QTL are detected, the last term becomes negligible, and the estimate for
from (11) approaches
![]() |
(12) |
Equation 12 makes sense: we would expect the smallest observed QTL to lie above
by µ/nd (because the distribution of the smallest QTL is an exponential with parameter µ/nd shifted to the right by
). Hence, E[amin] =
+ µ/nd =
+
, which rearranges to give (11). With few detected QTL, however, Equation 12 is subject to substantial sampling error and can become negative, which is biologically unrealistic. Equation 11, however, is always bounded between 0 and amin.
One potential problem with using amin and M in our estimators is that they will be overestimated in QTL studies as a result of the Beavis effect. This occurs because the effect sizes of the QTL are estimated from the same data used to detect the QTL. Those QTL that, by chance, happen to have a larger effect than expected are more likely to be detected and so lead to inflated estimates of ai. There is currently no analytical method to correct for the Beavis effect, so its impact was tested through simulation (described below). The simulations indicate that the Beavis effect did not cause a large bias in our QTL-based estimators. We suspect that
QTL is not extremely sensitive to the Beavis effect because both M and
tend to be inflated, but the estimator depends primarily on their difference [see (6) and (9)], which may not be as strongly biased.
Gamma-distributed effect sizes:
We now describe more general results that apply when allelic effects follow a gamma distribution with mean, µ, and coefficient of variation, C. The probability density function of a gamma distribution is
![]() |
(13) |
where
[x] is the gamma function (![]()
QTL, equals
D/(M -
), where
solves
![]() |
(14) |
Here,
=
, and
[x, y] is the digamma function (![]()
is essentially a correction factor for (6) that must be applied when the expected distribution of underlying effect sizes is gamma. If C = 1, the gamma distribution reduces to the exponential distribution, and
becomes one. For other values of C, the correction term depends only on C and
and is illustrated in Fig 3. When C > 1, the gamma distribution is L shaped, and more loci have very small effect. These small-effect loci are missed if one mistakenly assumed an exponential distribution and used Equation 6. Therefore, (6) underestimates the number of underlying loci (
> 1). The converse is true when C < 1, and the gamma distribution is bell shaped. In this case, fewer loci have small effect, and fewer loci fall below the threshold of detection. Consequently, (6) overestimates the number of underlying loci (
< 1). The sensitivity of the estimator to the shape of the distribution of allelic effects depends strongly on the power of the experiment (
). When the threshold of detection is high and
is near 0, the estimator is very sensitive to the shape of the distribution. Conversely, when the threshold of detection is low and
is near 1, the correction term approaches 1, because most alleles are detectable regardless of the shape of the distribution.
|
Again, amin can be used as an upwardly biased estimator for
. Alternatively, one can correct
using a Bayesian approach that takes into account the probability that amin is the smallest detectable QTL when effect sizes follow a gamma distribution. This leads to an expression involving
that must be numerically evaluated simultaneously with (14):
![]() |
(15) |
When C = 1, (15) reduces to (11).
| SIMULATIONS |
|---|
To assess the accuracy of our estimator and the Castle-Wright-Zeng estimator, QTL analyses were simulated using the QTLcartographer package (version 1.13a; ![]()
![]()
For each experimental genome generated, Rcross was used to simulate the production of 200 and then 500 F2 individuals. This F2 sampling procedure was repeated 10 times. This design allowed us to assess whether the estimators were more sensitive to the realized distribution of QTL within the genome (variation among "experimental genomes") or to the specific set of F2 individuals analyzed (variation among sampled individuals). For each set of F2 individuals, the QTL were mapped using Zmapqtl's interval mapping routine (model 3). A single permutation test was used to determine an approximate significance threshold for each number of QTL and F2 sample size (5% genome-wide significance level). In general, there was not much variation in significance thresholds (data not shown). This threshold was then used in conjunction with Eqtl to estimate the location and effects of the QTL (![]()
Unfortunately, interval mapping methods systematically bias the effects of chromosome regions linked to QTL (![]()
![]()
![]()
![]()
![]()
We chose interval mapping for two reasons. First, interval mapping is an established method that has been repeatedly used in empirical studies and is well explored theoretically. Second, interval mapping is much faster, allowing us to perform more extensive tests. We also suggest that interval mapping provides a conservative test of
QTL. Interval mapping produces data that are less precise than more advanced methods of QTL analysis, such as composite interval mapping and multiple interval mapping (![]()
![]()
QTL. Furthermore, interval mapping corresponds less well to our model, which assumes that QTL above the threshold of detection are always detected. Consequently, it seems reasonable that our estimator would perform even better with data from more sophisticated methods for detecting QTL.
Our simulation data were imported into Mathematica 3.0 (![]()
CWZ, we initially used the average recombination rate between randomly chosen pairs of loci,
, which equals 0.48 for a genome with 20 chromosomes, each of length 40 cM (![]()
CWZ estimate for the number of underlying loci averaged 49.3 (with 500 F2's) and -21.8 (with 200 F2's) over the 300 simulations! To avoid these problems, we set
to 1/2 in all of our analyses, which either made little difference (for small n) or improved the estimates.
Because the QTL estimator relies on the difference between the mean effect of factors found and the minimum effect found, it can only work if at least two QTL have been identified. Therefore, we excluded QTL analyses with only one detected QTL. In experimental genomes with more than two true QTL, very few analyses were excluded. With only two true QTL, however, approximately half of the analyses had to be excluded. Furthermore, in a very small number of cases with two true QTL (6/600), two QTL were detected whose estimated effects were, by chance, equal. To avoid division-by-zero errors, the mean was increased by 5% over the minimum in these cases. (If, instead, these cases were eliminated, the performance of the estimators improved slightly over the results shown in Table 1.)
| RESULTS |
|---|
The results of the simulations are presented in Table 1. As expected, our QTL-based estimator was more accurate when (11) was used to estimate the threshold of detection (
) than when the smallest detected QTL was used (amin). We therefore focus our discussion on estimates based on (11). Both our estimator and the Castle-Wright-Zeng estimator were fairly accurate on average when there was an intermediate number of underlying QTL (5
n
20). With 100 QTL, however, both methods underestimated the true number of QTL but for different reasons. The QTL-based estimator will be biased downward whenever the density of QTL is high, because tightly linked QTL are then rarely separated by recombination. Furthermore, in interval mapping, the number of detected QTL must be less than the number of marker intervals, which was only 80 in our study. This bias could be eliminated by following the lines for more generations (increasing the opportunity for recombination) and by adding more markers to the study (increasing the number of marker intervals). The Castle-Wright-Zeng estimator, on the other hand, becomes less and less accurate as the number of loci increases because the segregation variance approaches zero and becomes harder to estimate precisely. Although increasing the number of F2 progeny tested and following the lines for more generations may improve the accuracy of the Castle-Wright-Zeng estimator, our simulations indicate no substantial improvement between NF2 = 200 and NF2 = 500.
With only two true QTL, our estimator performed poorly, but the Castle-Wright-Zeng estimator continued to perform well, on average. Because we often had to exclude cases where only one QTL was detected when there were only two true QTL, it is not surprising that the estimators overestimated the number of underlying QTL. Note that if we also excluded cases where only two QTL were detected, the average of our estimator improved (Table 1, last two rows), which suggests that
QTL is biased upward by sampling error when there are few detected QTL. More generally, Table 1 suggests that, unless the number of QTL is very large, the QTL-based estimator tends to overestimate the true number of underlying loci and that this bias is stronger with fewer QTL. We expect an upward bias in our estimator for two reasons: (1) when there are few detected QTL, there will be substantial variation in the denominator (M -
) of Equation 6, which can approach zero and generate an overestimate, and (2) the Beavis effect will generate a greater upward bias in
than in M, which also leads (6) to overestimate the number of underlying loci. We therefore recommend the use of the QTL-based estimator only when three or more QTL have been detected and when the mean detected QTL is substantially above (say >25% above) the minimum threshold of detection.
Interestingly, the largest difference between the two estimators is not in their average performance but in their confidence limits. Appropriate 95% confidence limits should include the true value 95% of the time. The confidence limits based on (9) for our estimator have this property (see numbers in square brackets in Table 1). In those cases where our estimator had little bias (5 < n < 20), the confidence limits included the true value 95.1% of the time. On the other hand, the confidence limits for the Castle-Wright-Zeng estimator included the true value only 47.6% of the time. The confidence limits for
CWZ often excluded the true number of underlying loci because these limits only account for error caused by sampling a limited number of F2 individuals. They do not account for the sampling error inherent in having a limited number of QTL, whose effects represent particular draws from an underlying distribution. For example, if the QTL with the largest effect has, by chance, a magnitude that is greater than expected, there will be more segregation variance than expected, and
CWZ will underestimate n. Conversely, if the major QTL have, by chance, roughly equal influence, there will be less segregation variance than expected, and
CWZ will overestimate n.
In fact, the sampling of different sets of QTL in different experimental genomes accounts for a large fraction of the total variance in estimates of n (Fig 4). This is especially true for
CWZ, where almost all of the observed variation was among genomes with different sets of QTL (dark gray bars) rather than among the different sets of F2 individuals sampled from each experimental genome (light gray bars). In other words,
CWZ depended little on the exact set of F2 individuals, but it varied greatly each time a new set of QTL was generated. Fig 4 suggests that, with at least 200 F2 individuals, the confidence limits for the Castle-Wright-Zeng estimator are based on a minor source of error (F2 sampling variance) rather than the bulk of the error (QTL sampling variance). In practice, this means that if a researcher is interested in the number of underlying loci that are responsible for a trait difference between two parental lines, the Castle-Wright-Zeng estimator will too often indicate a high degree of confidence in the wrong number and exclude the right number. Furthermore, as indicated by the simulations, reestimating
CWZ using a different set of F2 individuals is unlikely to help because the sampling of F2's was not the major source of error (Fig 4).
|
Dominance:
In the above results, the simulated QTL had additive effects. To test the impact of dominance on our estimator, we simulated experimental genomes that included 20 QTL with nonadditive effects ranging from fully recessive to fully dominant. Although models have been developed to incorporate dominance into the Castle-Wright estimator, these generally assume that the mean and distribution of dominance coefficients are known. Because most QTL studies lack such information, we continue to use (2) and (6) to estimate the number of underlying loci. That is, we ask, how inaccurate are the two estimators if we assume no dominance when dominance is actually present? The inclusion of dominance did not noticeably affect the performance of the QTL-based estimator, but it caused
CWZ to underestimate n by
20% (Table 2). Dominance tends to inflate the segregation variance inferred from F2 individuals, because heterozygotes at a locus have genotypic values further from the mean. This effect is even more exaggerated with overdominance or underdominance, in which case the phenotype of the F2's may lie outside of the range defined by the two parental lines (transgressive segregation). This explains the sensitivity of
CWZ to the inclusion of dominance but does not address why
QTL was little affected by dominance. We believe that, because the methods used to detect QTL explicitly allow dominance levels to vary among loci, the estimated additive effects of the detected QTL (and hence
CWZ) are not strongly biased by dominance. Interestingly, the average power of the QTL experiments was also little affected by dominance (compare
values in Table 1 and Table 2). Because
QTL is less sensitive to dominance interactions, it is a more appropriate estimator to use than
CWZ whenever the nature of dominance is unknown.
|
QTL of opposite effect:
Table 3 presents results for experiments where the additive effect of a QTL had an 85% chance of being positive and a 15% chance of being negative. Although both estimators underestimated the true number of underlying QTL,
QTL was much less sensitive than
CWZ to the inclusion of loci affecting the trait of interest in the opposite direction. On average,
16 underlying QTL were estimated with
QTL, while
9 were estimated with
CWZ, whereas the true number was 20. When the effects of QTL oppose one another, the trait values of the F2's are no longer expected to lie strictly between the parental lines, providing another explanation for transgressive segregation. As with dominance, the segregation variance measured among the F2's is thus inflated, and the Castle-Wright-Zeng estimator underestimates the number of underlying loci. On the other hand, the inclusion of QTL with opposite effects reduced the power of the QTL experiments (compare
values in Table 1 and Table 3), but the additive effects of the detected QTL (and hence
QTL) were not strongly biased.
|
Gamma distribution of effect sizes:
Finally, we tested the extent to which the estimators were sensitive to the underlying distribution of effect sizes. Table 4 provides results from simulations where the underlying distribution was gamma (with C = 0.5 or 2.0) but where the analyses assumed an exponential distribution (C = 1.0). As expected,
QTL and
CWZ overestimated the true number of QTL when the underlying distribution had a lower coefficient of variation (C = 0.5). The extent of the bias was not severe for
CWZ in this case; this is consistent with the form of Equation 2, which changes less when C is lowered by a fraction than when it is increased. Conversely, both
QTL and
CWZ underestimated the true number of QTL by
50% when the underlying distribution had a higher coefficient of variation (C = 2.0), with
CWZ performing slightly worse than
QTL. One can correct these estimators using Equation 2 for
CWZ, and 14 and 15 for
QTL. These corrections brought the estimates toward the true value of 20 but tended to overcorrect, especially for
QTL when C = 2.0, perhaps because the observed coefficient of variation of the true effects of the QTL was highly variable in this case. Of course, making these corrections requires external knowledge about the underlying distribution, which will often be lacking. When many QTL have been detected, the number of underlying loci, the threshold of detection, and the shape of the distribution could be simultaneously estimated using a maximum-likelihood approach. One potential problem is that the Beavis effect will bias the shape parameter (lowering C) by making QTL of small effect seem larger.
|
| CONCLUSIONS |
|---|
Historically, the number of genetic factors, n, underlying an observed difference between two parental lines has been estimated using methods developed by ![]()
![]()
This problem is illustrated in Fig 5, which shows the expected number of detectable QTL as a function of the number of underlying QTL. Fig 5 is based on the assumption that there is a threshold below which a QTL is unlikely to be detected and above which it is. Two threshold levels of detection are illustrated, with
set to 5 or 10% of D, where D is the total additive effect size (i.e., half the parental difference). The first threshold was typical in our simulation studies with 500 F2's, while the second was typical in simulations with 200 F2's. The most striking feature of these curves is that they do not increase monotonically with the number of underlying loci. Instead, the expected number of detected loci initially rises, reaches a maximum, and then falls back toward zero as the number of underlying loci increases. These curves suggest two reasons why a certain number of QTL may be observed: (1) there are few underlying QTL, but their average effect is relatively large such that most are above the threshold, or (2) there are several QTL, but their average effect is relatively small such that few are above the threshold of detection. Furthermore, the maxima of these curves is fairly low, indicating that only a handful of QTL will be detected regardless of the true number of loci contributing to the trait difference. These conclusions are the same whether the effects of the underlying loci follow an exponential distribution, an L-shaped gamma distribution, or a bell-shaped gamma distribution. In short, because QTL studies predominantly detect loci of large effect, the number of loci detected in a QTL study is not linearly related to the number of underlying loci.
|
Here, we present a new estimator of gene number,
QTL, that takes into account the bias of QTL analyses toward detecting loci of large effect. By noting the average size and the minimum size of the detected QTL, we can estimate the number and magnitude of the loci whose effects were too small to be detected. As with the Castle-Wright estimator, this technique requires us to specify the expected distribution of effect sizes. We develop a QTL-based estimator for an exponential distribution and a gamma distribution of effect sizes. Although our method assumes that QTL analyses have a negligible probability of detecting a QTL below a threshold (
) and a 100% probability of detecting QTL above
, simulations indicate that this simplifying assumption does not generate a substantial bias in the average number of estimated loci (
QTL).
As Table 1 shows, our QTL-based estimator provides a good approximation for the number of underlying loci unless few QTL were detected (nd < 3) or the genetic map was saturated with QTL (more QTL than marker intervals; nd > 80). Furthermore, in those cases where the average value of the estimator approximately equals the true number of underlying loci (i.e., when 5
n
20), the 95% confidence limits based on
QTL contain n
95% of the time (Table 1). In contrast, the 95% confidence limits for the Castle-Wright-Zeng estimator often miss the true value for n, despite the fact that the simulations and the estimator both assume an exponential distribution of effect sizes. Essentially, the confidence limits for the Castle-Wright-Zeng estimator do not account for the variance inherent in the sampling of mutations that arise and fix within a population. Because the confidence limits for
QTL take into account the sampling error inherent in drawing allelic effects from an underlying distribution, they more often include the true value for the number of underlying loci. An additional benefit of our estimator is that it is less sensitive to dominance (compare Table 1 and Table 2) and to violations of the assumption that the additive effects of all alleles have the same sign (compare Table 1 and Table 3).
To demonstrate the application of our estimator to real data, we consider two examples. The first, a QTL study by ![]()
|
The above example highlights the difference between
QTL and
CWZ. Our next example demonstrates that our estimator could be used to predict the number of loci that may be uncovered in a more powerful QTL study. ![]()
![]()
500 individuals per backcross) and reanalyzing the old and new data sets using multiple interval mapping. We concentrate on the results presented by Zeng et al., as they were obtained by applying the same mapping methodology and criteria to both data sets.
Liu et al.'s backcross analysis suggested that 1113 QTL are involved in the genitalia difference. [The first backcross to D. mauritiana (BM) identified 11 QTL; the backcross to D. simulans (BS) identified 13 QTL.] Based on Equation 11, our estimator suggests that the true number is closer to 21.4 for BM and 23.2 for BS. The number of QTL found in the BS analysis is within our 95% confidence intervals (CIs), albeit just barely (95% CI, 12.738.1). The number of QTL found in the BM analysis, however, is not (95% CI, 11.136.7). This predicts that a fair number of QTL were probably missed in these analyses, a fact that was confirmed by ![]()
QTL = 16.7 (95% CI, 10.225.3) for the BS lines and 20.0 (95% CI, 12.330.5) for the BM lines. The value of 19 QTL found by Zeng et al. is well within our 95% confidence intervals and is near the mean of the two estimates. This, plus the fact that the power of the second analysis is quite high (
= 0.97 for the BS analysis and
= 0.92 for the BM analysis), indicates that nearly all of the QTL have now been identified. Thus, our estimator accurately estimated the number of QTL that could be found in a powerful experiment given data from a less powerful experiment.
Improving estimates of gene number could have an important impact in both quantitative and evolutionary genetics. First, better estimates would help researchers know how many genes affecting a trait of interest go undetected. They could then make more informed decisions about whether to refine a QTL analysis to uncover missing factors. Second, better estimates of the actual number of genes underlying divergent traits can help us evaluate the applicability of quantitative genetic models, which assume a large number of underlying loci. Third, improved estimates of gene number provide interesting information about the genetic architecture underlying evolutionary change. For example, they can help us identify the sorts of phenotypic changes that are typically accomplished by few allelic substitutions. Although we have taken steps to incorporate data being generated in QTL studies, further work is warranted. In particular, it would be valuable to know how best to use the information contained in both the Castle-Wright-Zeng and the QTL-based estimators to obtain an even more powerful estimator of the number of loci underlying phenotypic divergence.
| ACKNOWLEDGMENTS |
|---|
We owe special thanks to H. Allen Orr for encouraging this project along, and for generously sharing his time and ideas, and to John Huelsenbeck for kindly providing access to his computer facilities. We also thank Andrea Betancourt, Thomas Lenormand, J. P. Masly, Allen Orr, Art Poon, Daven Presgraves, Dolph Schluter, Peter Visscher, Bruce Walsh, Michael Whitlock, Zhao-Bang Zeng, and an anonymous reviewer for their helpful comments on the project and manuscript. This work was inspired by a discussion of the Vancouver Evolutionary Group and was sponsored by grants from the Natural Sciences and Engineering Research Council of Canada (S.P.O.), the Peter Wall Institute for Advanced Studies (S.P.O.), the David and Lucile Packard Foundation (H. A. Orr), the National Institutes of Health (GM51932 to H. A. Orr), and a Caspari Fellowship from the University of Rochester (C.D.J.).
Manuscript received December 17, 1999; Accepted for publication July 28, 2000.
| LITERATURE CITED |
|---|
ABROMOWITZ, M., and I. A. STEGUN, 1972 Handbook of Mathematical Functions. Dover, New York.
BASTEN, C. J., B. S. WEIR and Z.-B. ZENG, 1996 A Reference Manual and Tutorial for QTL Mapping. Department of Statistics, North Carolina State University, Raleigh, NC.
BEAVIS, W. D., 1994 The power and deceit of QTL experiments: lessons from comparative QTL studies. Proceedings of the Corn and Sorghum Industry Research Conference, American Seed Trade Association, Washington DC, pp. 250266.
BEAVIS, W. D., 1998 QTL analyses: power, precision, and accuracy, pp. 145162 in Molecular Dissection of Complex Traits, edited by A. H. PATERSON. CRC Press, Boca Raton, FL.
BRADSHAW, H. D., JR., K. G. OTTO, B. E. FREWEN, J. K. MCKAY, and D. W. SCHEMSKE, 1998 Quantitative trait loci affecting differences in floral morphology between two species of monkeyflower (Mimulus). Genetics 149:367-382
CASTLE, W. E., 1921 An improved method of estimating the number of genetic factors concerned in cases of blending inheritance. Proc. Natl. Acad. Sci. USA 81:6904-6907.
CHURCHILL, G. A. and R. W. DOERGE, 1994 Empirical threshold values for quantitative trait mapping. Genetics 138:963-971[Abstract].
DOEBLEY, J. and A. STEC, 1991 Genetic analysis of the morphological differences between maize and teosinte. Genetics 129:285-295[Abstract].
FELLER, W., 1971 An Introduction to Probability Theory and Its Applications, Vol. II. John Wiley, New York.
FISHER, R. A., 1958 The Genetical Theory of Natural Selection, Ed. 2. Dover, New York.
GILLESPIE, J. H., 1991 The Causes of Molecular Evolution. Oxford University Press, New York.
GOFFINET, B. and B. MANGIN, 1998 Comparing methods to detect more than one QTL on a chromosome. Theor. Appl. Genet. 96:628-633.
KAO, C. H., Z. B. ZENG, and R. D. TEASDALE, 1999 Multiple interval mapping for quantitative trait loci. Genetics 152:1203-1216
KEIGHTLEY, P. D., 1994 The distribution of mutation effects on viability in Drosophila melanogaster. Genetics 138:1315-1322[Abstract].
LANDE, R., 1981 The minimum number of genes contributing to quantitative variation between and within populations. Genetics 99:541-553
LARSEN, R. J., and M. L. MARX, 1985 An Introduction to Probability and Its Applications. Prentice-Hall, Englewood Cliffs, NJ.
LAURIE, C., J. R. TRUE, J. LIU, and J. M. MERCER, 1997 An introgression analysis of quantitative trait loci that contribute to a morphological difference between Drosophila simulans and D. mauritiana. Genetics 145:339-348[Abstract].
LEIPS, J. and T. F. C. MACKAY, 2000 Quantitative trait loci for life span in Drosophila melanogaster: interactions with genetic background and larval density. Genetics 155:1773-1788
LIU, J., J. M. MERCER, L. F. STAM, G. C. GIBSON, Z. B. ZENG, and C. C. LAURIE, 1996 Genetic analysis of a morphological shape difference in the male genitalia of Drosophila simulans and D. mauritiana. Genetics 142:1129-1145[Abstract].
LYNCH, M., and B. WALSH, 1998 Genetics and Analysis of Quantitative Traits. Sinauer Associates, Sunderland, MA.
LYNCH, M., J. BLANCHARD, T. KIBOTA, S. SCHULTZ, and L. VASSILIEVA et al., 1999 Perspective: spontaneous deleterious mutation. Evolution 53:645-663.
MACKAY, T. F. C., 1996 The nature of quantitative genetic variation revisited: lessons from Drosophila bristles. Bioessays 18:113-121[Medline].
ORR, H. A., 1998 The population genetics of adaptation: the distribution of factors fixed during adaptive evolution. Evolution 52:935-949.
ORR, H. A., 1999 The evolutionary genetics of adaptation: a simulation study. Genet. Res. 74:207-214[Medline].
PATERSON, A. H., S. DAMON, J. D. HEWITT, D. ZAMIR, and H. D. RABINOWITCH et al., 1991 Mendelian factors underlying quantitative traits in tomato: comparison across species, generations, and environments. Genetics 127:181-197[Abstract].
VISSCHER, P. M., R. THOMPSON, and C. S. HALEY, 1996 Confidence intervals in QTL mapping by bootstrapping. Genetics 143:1013-1020[Abstract].
WHITTAKER, J. C., R. THOMPSON, and P. M. VISSCHER, 1996 On the mapping of QTL by regression of phenotype on maker-type. Heredity 77:23-32.
WOLFRAM, S., 1996 The Mathematica Book, Ed. 3. Wolfram Media/Cambridge University Press, Cambridge, MA.
WRIGHT, S., 1968 Evolution and the Genetics of Populations. University of Chicago Press, Chicago.
ZENG, Z. B., 1992 Correcting the bias of Wright estimates of the number of genes affecting a quantitative charactera further improved method. Genetics 131:987-1001[Abstract].
ZENG, Z. B., 1994 Precision mapping of quantitative trait loci. Genetics 136:1457-1468[Abstract].
ZENG, Z. B., J. LIU, L. F. STAM, C. H. KAO, J. M. MERCER, and C. C. LAURIE, 2000 Genetic architecture of a morphological shape difference between two Drosophila species. Genetics 154:299-310
This article has been cited by other articles:
![]() |
B. Gutierrez-Gil, N. Ball, D. Burton, M. Haskell, J. L. Williams, and P. Wiener Identification of Quantitative Trait Loci Affecting Cattle Temperament J. Hered., November 1, 2008; 99(6): 629 - 638. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Genissel, L. M. McIntyre, M. L. Wayne, and S. V. Nuzhdin Cis and Trans Regulatory Effects Contribute to Natural Variation in Transcriptome of Drosophila melanogaster Mol. Biol. Evol., January 1, 2008; 25(1): 101 - 110. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Chamberlain, H. C. McPartlan, and M. E. Goddard The Number of Loci That Affect Milk Production Traits in Dairy Cattle Genetics, October 1, 2007; 177(2): 1117 - 1123. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. L. Burch, S. Guyader, D. Samarov, and H. Shen Experimental Estimate of the Abundance and Effects of Nearly Neutral Mutations in the RNA Virus {phi}6 Genetics, May 1, 2007; 176(1): 467 - 476. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. D. H. Barrett, L. K. M'Gonigle, and S. P. Otto The Distribution of Beneficial Mutant Effects Under Strong Selection Genetics, December 1, 2006; 174(4): 2071 - 2079. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Johnson and N. Barton Theoretical models of selection and mutation on quantitative traits Phil Trans R Soc B, July 29, 2005; 360(1459): 1411 - 1425. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Poon, B. H. Davis, and L. Chao The Coupon Collector and the Suppressor Mutation: Estimating the Number of Compensatory Mutations by Maximum Likelihood Genetics, July 1, 2005; 170(3): 1323 - 1332. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. B. Brem and L. Kruglyak The landscape of genetic complexity across 5,700 gene expression traits in yeast PNAS, February 1, 2005; 102(5): 1572 - 1577. [Abstract] [Full Text] [PDF] |
||||
![]() |
X.-S. Zhang and W. G. Hill Predictions of Patterns of Response to Artificial Selection in Lines Derived From Natural Populations Genetics, January 1, 2005; 169(1): 411 - 425. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. O. Wilke The Speed of Adaptation in Large Asexual Populations Genetics, August 1, 2004; 167(4): 2045 - 2053. [Abstract] [Full Text] [PDF] |
||||
![]() |
X.-S. Zhang, J. Wang, and W. G. Hill Redistribution of Gene Frequency and Changes of Genetic Variation Following a Bottleneck in Population Size Genetics, July 1, 2004; 167(3): 1475 - 1492. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Mihaljevic, H. F. Utz, and A. E. Melchinger Congruency of Quantitative Trait Loci Detected for Agronomic Traits in Testcrosses of Five Populations of European Maize Crop Sci., January 1, 2004; 44(1): 114 - 124. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. K. Griswold and M. C. Whitlock The Genetics of Adaptation: The Roles of Pleiotropy, Stabilizing Selection and Drift in Shaping the Distribution of Bidirectional Fixed Mutational Effects Genetics, December 1, 2003; 165(4): 2181 - 2192. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Xu Theoretical Basis of the Beavis Effect Genetics, December 1, 2003; 165(4): 2259 - 2268. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. C. Albertson, J. T. Streelman, and T. D. Kocher Genetic Basis of Adaptive Shape Differences in the Cichlid Head J. Hered., July 1, 2003; 94(4): 291 - 301. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. A. Orr The Distribution of Fitness Effects Among Beneficial Mutations Genetics, April 1, 2003; 163(4): 1519 - 1526. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Xu Estimating Polygenic Effects Using Markers of the Entire Genome Genetics, February 1, 2003; 163(2): 789 - 801. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. F. Noor, A. L. Cunningham, and J. C. Larkin Consequences of Recombination Rate Variation on Quantitative Trait Locus Mapping Studies: Simulations Based on the Drosophila melanogaster Genome Genetics, October 1, 2001; 159(2): 581 - 588. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Otto, S. P.
- Articles by Jones, C. D.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Otto, S. P.
- Articles by Jones, C. D.


























