The core of statistical inference is based on both hypothesis testing and estimation. The use of inferential statistics for QTL identification thus includes estimation of genetic effects and statistical tests. Typically, QTL are reported only when the test statistics reach a predetermined critical value. Therefore, the estimated effects of detected QTL are actually sampled from a truncated distribution. As a result, the expectations of detected QTL effects are biased upward. In a simulation study, William D. Beavis showed that the average estimates of phenotypic variances associated with correctly identified QTL were greatly overestimated if only 100 progeny were evaluated, slightly overestimated if 500 progeny were evaluated, and fairly close to the actual magnitude when 1000 progeny were evaluated. This phenomenon has subsequently been called the Beavis effect. Understanding the theoretical basis of the Beavis effect will help interpret QTL mapping results and improve success of marker-assisted selection. This study provides a statistical explanation for the Beavis effect. The theoretical prediction agrees well with the observations reported in Beavis's original simulation study. Application of the theory to meta-analysis of QTL mapping is discussed.
THE primary goal of genetic mapping experiments is to identify the locations of genes that affect variable expression of a trait among individuals. Most researchers also use the data from a sampled population to estimate the genetic effects of quantitative trait loci (QTL). Information about the magnitude of the genetic effects of significant QTL is useful in prioritizing subsequent uses of the loci as candidate genes for consideration in genetic engineering and marker-assisted selection (Lande and Thompson 1990). Many statistical methods produce unbiased or asymptotically unbiased estimates of parameters. However, this property does not apply to QTL effects estimated from genome scans. This phenomenon was first discussed by Lande and Thompson (1990).
Almost all QTL mapping procedures can detect QTL with large effects, but not all can detect QTL with intermediate and small effects. Quantitative traits are defined as traits controlled primarily by intermediate and small QTL. Beavis (1994, 1998) designed a large-scale simulation experiment to evaluate the efficiency of interval mapping for detecting and estimating polygenes. Here-after, this simulation experiment is called the Beavis experiment. Beavis (1998) simulated either 10 or 40 independent QTL, examining situations where each QTL explained a proportion of the phenotypic variance ranging from 0.75 to 9.5% with sample sizes ranging from 100 to 1000. The Beavis experiment showed that the average estimates of phenotypic variances associated with correctly identified QTL were greatly overestimated if only 100 progeny were evaluated, slightly overestimated if 500 progeny were evaluated, and fairly close to the actual magnitude when 1000 progeny were evaluated. When the sample size was small, say 100, the statistical power of detecting a small QTL was as low as 3% and the estimated effects were typically inflated 10-fold. This phenomenon has since been referred to as the Beavis effect and has formed the basis of a number of subsequent analyses (e.g., Otto and Jones 2000; Hayes and Goddard 2001).
Estimating the number of quantitative trait loci is another goal of QTL mapping experiments (Sillanpaa and Arjas 1998). Otto and Jones (2000) acknowledged that the estimated number of QTL using molecular markers (interval mapping) can be more informative than that obtained from analyses using only phenotype (Lande 1981; Zeng 1992). However, a QTL is reported from the analysis only if it is detected. Therefore, the distribution of the detected QTL is actually inferred from censored data due to missing small-effect QTL. Otto and Jones (2000) incorporated the Beavis effect into their maximum-likelihood analysis to recover the potential number of QTL using the number of detected QTL, so called “detecting the undetected.” It should be emphasized that the purpose of Otto and Jones (2000) was to infer the number of QTL for a quantitative trait within a single experiment. Some QTL, however, may have large effects and some may have small effects, and estimating the total number depends on the distribution of QTL effects. If the overall distribution of the effects is described by an exponential distribution, the distribution of detected QTL effects becomes a truncated exponential distribution after incorporating the Beavis effect. This should not be confused with the original Beavis experiment where all simulated QTL had an equal genetic effect.
Hayes and Goddard (2001) conducted a meta-analysis for QTL mapping in livestock to infer the distribution of the effects of genes. They fit the estimated gene effects surveyed from all QTL mapping experiments in pigs and cattle to a gamma distribution. The difference between this meta-analysis and the usual analysis is that estimated genetic effects from multiple experiments were fit in a model rather than estimated from observed measurements. As a consequence, they had to take into account two additional sources of error for the estimated gene effects: the experimental error due to limited sample sizes and bias caused by data censoring. They found that the estimated genetic effects fit a gamma distribution very well. Again, one should not confuse this meta-analysis with the Beavis experiment because Hayes and Goddard (2001) intended to infer the distribution of genetic effects throughout the genome from multiple experiments. Even if two QTL were identified roughly at the same location by two investigators, they were still treated as different QTL because the estimated positions were not precise enough to be assigned to exactly the same position within the genome. The original Beavis experiment demonstrated that a bias exists in estimating QTL effects. Herein, I present a theoretical basis for the Beavis effect. The theory predicts the amount of bias as a function of the type of progeny used in the experiment, the estimation procedure, the marker density, and the sample size.
Marker analysis: The theory is developed using a backcross (BC) mating design, which provides the simplest genetic model in QTL mapping. The result is then extended to other progeny derived from common mating designs. Let yj be the phenotypic value of a quantitative trait measured from individual j sampled from a BC population. The linear model describing yj is (1) where μ is the population mean, a is the additive genetic effect at the locus of interest, and ϵj is the residual error, which is assumed to follow a normal distribution, N(0, σ2). In the BC progeny, there are only two possible genotypes at the QTL, heterozygote and homozygote for the backcross parent allele. The independent variable xj is an indicator variable defined as xj = 0 for the homozygote and xj = 1 for the heterozygote.
Let â be the estimated genetic effect and be the variance of the estimate for a QTL. For simplicity, let us assume that the QTL is closely linked to a marker so that (2) where n is the sample size (the number of BC progeny) and (3) is the sampling variance of the genotype. Note that n may be replaced by n – 1 for a more precise result.
The variance of x depends on the type of progeny. In a BC population without segregation distortion, half of the individuals will be homozygous and half heterozygous, leading to in the limit of large n. We have implicitly made the assumption that there is no dominance effect. If there is dominance, it will be confounded with the additive effect; i.e., the genetic effect is actually a + d instead of a. Therefore, all subsequent analyses are based on the additive model.
If the positions of all QTL are known a priori (uncensored), the estimate of each QTL effect is approximately unbiased (depending on the method used) so that the distribution of â can be assumed to be approximately normal, i.e., N(a, ). The normal distribution is mathematically convenient because it allows the standard statistical machinery for censored/truncated data to be used (Cohen 1991). Of course, if the assumption of normal distribution is violated, the theory developed here is not valid. The assumption of a normal distribution for â holds better if the residual error is normally, independently, and identically distributed. Some traits may be distributed in a discrete manner but analyzed as a quantitative trait, and some traits may have a skewed distribution with unequal variances. These traits may not have a normal residual error, and thus the theory developed in this study should be used with extra caution for such traits.
In the practice of QTL mapping, estimated genetic effects are reported only for significant QTL. Thus, the reported QTL consists of a censored sample so that the distribution of the estimated QTL effects conditional on statistical significance is a truncated normal distribution with a mean and variance different from those of the original normal distribution.
I now proceed to calculate this truncated distribution. Let us assume that a z-test statistic is used for the significance test so that the critical value for the z-test statistic is defined as Z1–α/2, where α is a controlled type I error rate. A QTL is reported only if |z| > Z1–α/2, where z = â/σâ is the z-test statistic. In other words, all the QTL reported satisfy |â| > σâZ1–α/2, i.e., â <–σâZ1–α/2 or â > σâZ1–α/2. The two-tailed test leads to the possibility that even if a is positive, it may be detected as a significantly negative effect due to sampling. Denote the truncated â by âT so that , where NT represents a truncated normal distribution, aT = E(âT) is the expectation, and is the variance of the truncated normal distribution. The truncated normal distribution is not symmetric. Let us define the two truncation points in the standardized (z-test statistic) scale as (4) and (5) Further define (6) where φ(ξ) and Φ(ξ) are the standardized normal probability density function (pdf) and cumulative distribution function (cdf), respectively.
Often, when using likelihood methods, the LOD score criterion may be used for QTL detection. The critical value of the LOD score can be converted into the critical value of the z-test statistic, using the following approximate relationship, (7) where is the critical value of the chi-square distribution with 1 d.f. and is also the critical value for the likelihood-ratio test statistic. The relationship between the likelihood-ratio test statistic and LOD can be found in Lynch and Walsh (1998).
According to the standard statistical machinery of truncated normal distributions (8) (Cohen 1991), so that the bias is (9) The variance of the truncated sample is (10) This truncated variance is used later when we discuss the bias in the QTL variance estimate.
The phenotypic variance may be partitioned into the true QTL variance ( ) and the residual variance (σ2). The genetic variance of a QTL in a BC population is . The corresponding variance in an F2 population is . In general, the genetic variance can be expressed as , where γ is a constant depending on the filial relationship among segregating progeny; e.g., γ = ½ for BC and γ = ½ for F2. Therefore, a commonly used approach to estimating the genetic variance is . However, this estimation is also biased because (11) where (12) Therefore, the bias in the estimated variance is (13) Note that there are two sources of bias: the contribution of environmental variance to the estimate of a2 given by plus the Beavis effect given by . Thus, this estimator is biased even in the absence of the Beavis effect. The bias due to the first source of error has been investigated by Luo et al. (2003), who also provided a method to correct the bias.
The one-tailed test is a special case of the two-tailed test in that we simply set ξ1 = – ∞ and ψ1 = 0. For the two-tailed test, we can further simplify B into due to and . We did not present this simplified version of B as a general formula because it would not allow us to extend the results simply to the one-tailed test.
The proportion of phenotypic variance explained by the QTL is defined as (14) where is the total phenotypic variance among the test progeny. The notation is adopted from classical quantitative genetics and is known as the heritability. When used as a proportion of variance contributed by an individual QTL, it is no longer called the heritability. The typical estimator for is . It is hard to find the expectation of a ratio. However, if we assume that the estimation error of the denominator is negligible, we can take . This assumption is justified for many quantitative traits because the phenotypic variance is usually accurately estimated. On the basis of this assumption, we get (15) Letting , the bias of the genetic variance can be rewritten as (16) which leads to (17) Therefore, the proportion of the phenotypic variance associated with the QTL also is biased. Note that the term in Equation 17 involves the genetic effect, which can be expressed using (14) as (18) If we substitute Equation 18 into Equations 4 and 5, σ cancels out from . Therefore, the bias given in Equation 17 is only a function of and the parameters associated with the sampled progeny.
We now extend the results to other types of progeny. For simplicity, we assume that dominance is absent. In an F2 population, there are three possible genotypes whose genotypic values are defined as a for the homozygote with the “high allele,” 0 for the heterozygote, and –a for the homozygote with the “low allele.” The linear model given in Equation 1 applies here in the F2 population except that the x variable is now defined as xj = 1, 0, –1 for the three genotypes, respectively. Without segregation distortion, the variance of x in an F2 population is . Therefore, the results derived above apply here with in BC progeny replaced by in F2 progeny.
The method also can be applied to double-haploid (DH) and recombinant inbred line (RIL) populations. In both DH and RIL, the heterozygous genotype is absent. The x variable is defined as xj = 1 for one homozygote and xj =–1 for the other homozygote. The two types of homozygote have an equal frequency, and thus in both DH and RIL. The results derived above apply by substituting in the BC by in the DH and RIL.
Note that and γ are identical for all types of progeny if a marker provides a fully informative genotype for the QTL. Different notation is used because there can be differences when the QTL is not tightly linked to a marker.
Interval mapping: In interval mapping, a QTL may be identified at an intermediate position between markers by inferring the genotype of the QTL from flanking marker information. This will affect both the parameter estimates and the statistical tests of inference. We need to substitute the variance of x by the variance of the estimated x, denoted by . This variance depends on the relative position of the QTL within the interval. The explicit expression is tedious, but numerical values may be computed conveniently. For example, in a BC population, there are four possible marker classes, and the estimated x is denoted by x̂j = E(xj|m1j, m2j) = pj(1), where m1j and m2j indicate the flanking marker genotypes and pj(1) = Pr(xj = 1|m1j, m2j). Let Pr(m1j, m2j) be the joint probability of the flanking marker genotype. These values are given in Table 1, where r1 and r2 are the recombination fractions between the QTL and the two markers, respectively, and r is the recombination between the two markers. From this table, we can calculate using (19) Note that when is used in place of , the assumption of a normal distribution for â may be less valid.
The method used to derive implicitly assumes that the least-squares method (Haley and Knott 1992) is used because the expectation of the QTL genotype indicator is calculated conditional on markers only. In maximum-likelihood analysis, the expectation is actually calculated conditional on both markers and the phenotypic values. However, it is extremely difficult to derive the variance using expectation conditional on both markers and phenotypes. Therefore, the way we calculate is considered as an approximation.
For a DH population, we define x̂j in a slightly different way (see Table 2) but still use (19) to calculate . For RIL, we use the same table (Table 2), but replace r1, r2, and r by c1, c2, and c, where c = 2r/(1 + 2r) (Lynch and Walsh 1998).
In an F2 population, there are nine flanking marker genotypic classes. The definitions of x̂j's and their probabilities for the nine classes are given in Table 3. The same formula (Equation 19) is used to calculate except that the summation is taken over nine categories. For example, if the QTL position is in the middle of a 20-cM interval, we have r1 = r2 = 0.0906 and r = 0.1648, leading to . Figure 1 shows the relationship between and the position of the QTL in the 20-cM interval flanked by two fully informative markers. Note, as the QTL position approaches a marker, approaches ½.
Bias was evaluated numerically by considering the following factors: the sample size, the genetic effect measured by , and the LOD score criterion. In evaluating the bias of QTL effects, the residual variance was set at unity, i.e., σ2 = 1.0. If the actual residual variance is not unity, one can always standardize the genetic effect using a/σ in place of a.
The numerical evaluation was conducted only in BC populations because the general trends are similar for all types of progeny (data not shown). In addition, only a one-tailed test was evaluated. The one-tailed test is a special case of the two-tailed test with ξ1 = – ∞ and ψ1 = 0. The functional relationships between the size of the detected QTL and the true sizes are shown in Figure 2. The diagonal lines in the first column of Figure 2 represent the case where aT = a and those in the second column represent the case where , which holds only when the sample size is infinitely large. With finite sample sizes, the curves deviate from this straight line and the deviation increases as the sample size decreases. The deviation also increases as the LOD criterion increases. The deviation is negligible when (corresponding to a/σ= 0.2), even if the sample size is as small as n = 50. Assume that the commonly used LOD criterion is 3 and the sample size is 200; the bias is within 7% of the true value as long as . The bias becomes more severe, however, for small detectable QTL. This observation is consistent with that observed by Beavis (1998). It is interesting to note that when and n = 50, the bias in is as high as 0.18 for LOD = 2 and 0.33 for LOD = 5.
The functional relationships between and the sample size under various LOD criteria and are shown in the right column of Figure 3. The corresponding plots for the effects are given in the left column. When n ≥ 200 and ≥ 0.10, all the biases are negligible (within 10% of the true value), regardless of the LOD criterion. For small , a large sample size, even as large as n = 500, is not sufficient to eliminate the bias, again consistent with results of the Beavis experiment (Beavis 1998).
If the average estimated is 0.14 among all the experiments surveyed where the average sample size is ∼100 and the average LOD criterion is ∼3, the true may actually be zero (found from the second panel from the top of the right column of Figure 2). If is 0.15 (just a slight increase), however, the true is ∼0.08. If , the true is about the same as ; i.e., very small bias is expected. From these graphs or using Equation 17, one can find the true retrospectively from for all other settings.
We now use the parameter values of the Beavis experiment to compare the biases with those observed by Beavis (1998). Table 4 shows the original data reported by Beavis as well as the predicted biases for both the QTL effects and their variances. The average QTL position is in the middle of a 20-cM interval. However, for an interval this short, the position of a detected QTL rarely coincided with the true position. We observed that the detected position within the interval varied almost uniformly across the interval. Therefore, we choose an average = (0.5 + 0.4)/2 = 0.45 as the estimated variance of x, where 0.5 is the variance when the QTL is completely linked with a marker and 0.4 is the variance when the QTL is exactly in the middle of the interval. The agreements between the observed and the predicted biases, judged by the differences between the two, are quite good except when the sample size is small (n = 100). However, the percentage differences, (observed-predicted)/predicted, show an opposite trend with >8% discrepancies in the variance observed only when n = 1000. The percentage difference may be misleading, however, as the predicted effects become smaller and more sensitive to error with increasing sample size and because, ultimately, one is interested in the absolute error made in inferring the effects of QTL. The large absolute deviations of the predicted biases from the observed values for small sample size may be explained as follows. The critical value of LOD = 2.5 was used by Beavis for a test statistic involving both the additive and dominance effects. Our prediction, however, was based on a test statistic for additive effects only. For small samples, some QTL were detected due to large estimated dominance effects, even though the dominance was absent in the simulations (see Table 10.5 of Beavis 1998). To verify this, we redid the simulations for all the cases with n = 100 and used the pure additive model (dominance was included in neither the simulation nor the model). We used our own interval-mapping program known as QTL-BY-SAS to detect QTL (Xu and Xu 2003). Here, we replicated the experiment 500 times, instead of 200 times. The results are listed in Table 5, which shows very good agreement between the predicted and observed biases. Even though the LOD test statistic of the Beavis experiment involved 2 d.f. (additive and dominance), our model still predicted the biases quite accurately when the sample size was relatively large (n ≥ 500). This was because for large samples, the LOD score test statistic was determined primarily by the additive effect. The 2-d.f. LOD score and 1-d.f. LOD score were virtually identical.
The Beavis effect describes a phenomenon that occurred in the Beavis experiment where all QTL were simulated to have the same effect and distributed independently throughout the genome. The average effect of the detected QTL was biased upward due to censoring. It is more likely that QTL effects vary across the genome and the distribution of the QTL effects may be described by a negative exponential distribution (Xu 2003). In addition, some QTL may be linked within the same chromosomes and thus they do not segregate independently. On the one hand, the Beavis effect will cause the estimated number of QTL to be biased downward (Beavis 1994, 1998; Otto and Jones 2000), because the undetected QTL are not reported. On the other hand, the average effect of the detected QTL will be biased upward.
Using the Beavis effect to interpret results of a meta-analysis of QTL mapping is more straightforward. If a QTL mapping experiment can be repeated many times, the average effect of a chromosome location calculated only from the significant replicates will definitely be biased unless this QTL is detected in all replicates. If one considers incorporating a particular marker into a marker-assisted selection program for an economically important quantitative trait, the Beavis effect will affect the decision. The investigator may decide to search the literature to see how much genetic variance is accounted for by this marker from all published experiments.
The theory developed here helps predict the potential bias in the estimated effect of QTL. The theory may also be used to correct the bias but should be used with caution. Let be the average of the QTL effect estimated from N replicated experiments. To correct the bias, we may simply substitute the expectation given in Equation 17 by the observed average, (20) and solve for to obtain the unbiased estimate of . The equation is highly nonlinear, but the solution can be easily solved numerically. Unfortunately, the solution is very sensitive to the sample size (n). It works only when n is sufficiently large, say n ≥ 500. For a single experiment, we have only one estimate and the above equation becomes (21) The solution is even more sensitive to n. Therefore, the theoretical prediction of the Beavis effect may not be used retrospectively to correct for the bias when the sample size is small. The correction is necessary when the estimate is obtained by summarizing the results of many experiments. In that case, we should incorporate not only the mean of the estimates but also the variance of these estimates. The optimal method is the maximum-likelihood method that treats the estimates of the multiple experiments as censored data and infers (or recovers) the parameter of the uncensored data (Cohen 1991).
The theory developed herein applies to segregating populations with two alternative genotypes. For populations with more than two alternative genotypes, e.g., F2, the model is restricted to either the additive or the dominance model but not both. This is because the test statistic utilized is the z-test statistic, which is a 1-d.f. test. Further investigation is necessary to predict the biases in both the additive and dominance effects using a 2-d.f. test. Similar extensions can be made for a test with >2 d.f., e.g., four-way crosses (Xu 1996) and diallel crosses (Rebai and Goffinet 1993).
The variance of the genotype indicator (x) determines the estimation error of the QTL effect and thus plays an important role in the Beavis effect. When the QTL under investigation is tightly linked to a fully informative marker, is a constant depending on the filial relationships among progeny. If the marker is not fully informative, e.g., a certain percentage of individuals have missing genotypes at the marker, the variance will decrease and this information loss will have the same ultimate effect as increasing the interval between markers when the QTL is not at the marker. We use conditional on not only the type of progeny but also the marker information content and the QTL position. This conditional variance is denoted by and it is used whenever the QTL genotype is not directly observable.
The theory of the Beavis effect is derived on the basis of a single QTL model. However, it applies approximately to multiple QTL experiments, as shown in the Beavis experiment where multiple QTL were actually simulated (Tables 4 and 5). When we predict the bias, the residual variance includes the environmental variance plus the sum of the variances of the remaining QTL excluding the one in question. For example, when we predicted the bias for the QTL explaining 3% of the phenotypic variance, the residual variance was chosen as 100 – 3 = 97, where 100 is the total phenotypic variance. If the data were analyzed by a multiple-QTL model, e.g., composite interval mapping (Jansen 1993; Zeng 1994), the residual variance should include only the environmental variance given that the remaining QTL have been included in the model. When QTL are linked, the situation will be complicated and further investigation is necessary.
The author thanks Dr. Chenwu Xu for his help in the simulating experiment. The author also appreciates editor S. Otto and two anonymous reviewers for their constructive comments and suggestions on the manuscript. This work was supported by the National Institutes of Health grant R01-GM55321 and the U.S. Department of Agriculture National Research Initiative Competitive Grants Program 00-35300-9245.
Communicating editor: S. Otto
- Received November 11, 2002.
- Accepted August 26, 2003.
- Copyright © 2003 by the Genetics Society of America