We describe an approximate method for the analysis of quantitative trait loci (QTL) based on model selection from multiple regression models with trait values regressed on marker genotypes, using a modification of the easily calculated Bayesian information criterion to estimate the posterior probability of models with various subsets of markers as variables. The BIC-δ criterion, with the parameter δ increasing the penalty for additional variables in a model, is further modified to incorporate prior information, and missing values are handled by multiple imputation. Marginal probabilities for model sizes are calculated, and the posterior probability of nonzero model size is interpreted as the posterior probability of existence of a QTL linked to one or more markers. The method is demonstrated on analysis of associations between wood density and markers on two linkage groups in Pinus radiata. Selection bias, which is the bias that results from using the same data to both select the variables in a model and estimate the coefficients, is shown to be a problem for commonly used non-Bayesian methods for QTL mapping, which do not average over alternative possible models that are consistent with the data.
QUANTITATIVE trait loci (QTL) mapping is the process of finding and estimating associations between a continuous quantitative trait and a set of DNA markers that have been previously placed on a genetic map, with the ultimate goal of determining the genetic architecture of a trait, or finding markers that can be used to select for preferred values of the trait. The map is generally assumed to be known and correct and should cover a significant proportion of the genome. QTL mapping works on the principle that if a locus (called a QTL) on the genome is causing variation in a trait, and data are obtained from a cross (or pedigree) in which the QTL is segregating, then values of the trait will be correlated with markers linked to that locus. The closer the marker, the closer the correlation. For a marker at a given distance from the QTL, the larger the effect of the QTL, the larger the effect of the marker, as can be estimated from differences between subsets of the population with different marker classes. The statistical problem is to estimate the effects and locations of QTL or the effects of using associated markers to select progeny. The problem is challenging statistically because one or more QTL for a trait could be located anywhere on the genome.
Non-Bayesian QTL mapping: A common approach is to carry out a hypothesis test at each marker (a t-test or ANOVA) or at each of a series of points on the genome (interval mapping; Lander and Botstein 1989). An alternative to interval mapping, also involving multiple hypothesis tests, is based on regression on flanking markers (Haley and Knott 1992). See Paterson (1995) and Doerge et al. (1997) for reviews.
Markers or loci where the test statistic exceeds a threshold are chosen and considered to be “detected.” Problems with these methods are that QTL are often detected but where an independent verification population is used the markers are subsequently not verified and/or the estimated effects are much smaller in the verification population (see, e.g., Beavis 1994; Wilcoxet al. 1997; Melchingeret al. 1998). The latter problem is an example of selection bias, which is well known to statisticians in the context of stepwise regression (Miller 1990). Selection bias occurs when the same data are used to both select the variables in a regression model and to estimate the coefficients.
Bayesian statistics: Bayesian statistics aim to compute probability distributions for the underlying parameters in a model. With this information it is, in principle, possible to compute probabilities of any events or quantities of interest such as the probability of a linked QTL in a region or the expected gain from marker-aided selection.
An important aspect of Bayesian analysis is the use of marginal distributions. In the method of this article, the probability distributions of estimates and predictions are averaged over the values of parameters in the model and over possible models, rather than with parameters set to their most likely values in the most likely model, as is usually the case in non-Bayesian methods, such as maximum likelihood. In a single model, estimates and confidence intervals from maximum likelihood will be similar to their Bayesian counterparts provided the sample size is large enough. More significant differences arise, however, when testing “precise hypotheses” (Berger and Berry 1988). Existence or nonexistence of a QTL linked to a particular marker is one example. This difference occurs because Bayesian inference considers the probability of the data under each of the two possible models, e.g., H0:θ = 0 and H1:θ ≠ 0. The non-Bayesian hypothesis test considers the tail probability for a test statistic under H0, which can be shown to be approximately equivalent to a tail probability of the posterior distribution for the parameter, θ, being tested under H1. This does not allow for the finite nonzero prior probability (it must be nonzero or we would not need to test it) that H0 is true. More generally, where multiple models are consistent with the data, it is necessary to consider all these models according to their probabilities for valid statistical inference (cf. Raftery 1995, Rafteryet al. 1997).
Bayesian QTL mapping: A number of articles on Bayesian approaches to QTL mapping have appeared (reviewed by Hoescheleet al. 1997).
Several more recent articles have appeared that simultaneously consider multiple models with different numbers of QTL (Satagopan and Yandell 1996; Satagopanet al. 1996; Heath 1997; Sillanpää and Arjas 1998, 1999; Stephens and Fisch 1998). These articles use the “reversible jump” methodology of Green (1995) for constructing a sampler that jumps between models of different dimension. A major challenge remains to obtain a rapidly converging sampler for the full Bayesian model (D. A. Stephens, personal communication). The methods are complex to program and as yet there is no publicly available program with demonstrated rapid convergence.
More easily implemented and faster methods are useful, for people without access to the above programs, for checking the results of the more complex programs, or for preliminary or exploratory analysis as data are collected and may be useful for generating approximate starting values for algorithms for the full Bayesian models. Moreover, when data are limited and the actual genetic architecture is unknown, more generic statistical methods based on linear models with main effects and interactions may be more appropriate and more efficient (in terms of ease of application, rate of convergence, and validity) than the detailed modeling of genetic parameters from a particular genetic architecture. The information from the simpler methods is adequate to assess the evidence for the existence of QTL in a region, for marker-aided selection, or for determining QTL location to within a resolution of the distance between markers.
OVERVIEW: BAYESIAN QTL MAPPING USING MODEL SELECTION
Our approach is to relate trait values directly to marker genotypes, using multiple linear regression. Since there are many markers in a typical cross, most of these will not be near to a QTL. Therefore it is necessary to choose a model or models with subsets of markers selected. Broman (1997) advocated a stepwise regression approach for choosing the “best” model. However, we shall see that particularly with small sample sizes there can be a multiplicity of models that are compatible with the data, and these alternative models need to be considered along with their probabilities (Raftery 1995, Rafteryet al. 1997).
Our strategy is to build on methods for model selection in linear models from the statistical literature, starting with the Bayesian information criterion (BIC; Schwartz 1978) in this article, and a modification of the Bayesian method for model selection in hierarchical linear models (a whole family of linear models combined in an overarching single model) of George and McCulloch (1993) in future work. Our methodology is to the full Bayesian approach as the regression method of Haley and Knott (1992) is to interval mapping—a simpler more easily calculated method that nevertheless captures (at least approximately) the important aspects of the full Bayesian analysis, i.e., the posterior probability that a QTL is located in a given region, or the marginal distribution for the size of effects.
There is a major difference between our approach and others that consider multiple models. Previous approaches consider the model as specifying only the number of QTL on each linkage group, while the locations of the QTL are parameters in the model. In our approach the QTL are at fixed locations. In other words a QTL in our model really is a quantitative trait locus. A QTL at a different position is considered a different QTL and is represented by a different model. This increases the number of models but simplifies the analysis of a given model.
Statistical inference is carried out by combining information from each model according to its posterior probability. The following quantities of interest are estimated below: (i) marginal probabilities of selection of one or more markers in a linkage group or region, (ii) model-averaged effect of a marker, and (iii) model-averaged effect of allelic substitution of a marker. These quantities have the following interpretations: (i) probability of existence of one or more QTL in the linkage group or region, (ii) posterior expected gain attributable to QTL in the immediate vicinity of the marker (defined as the region closer to the marker than to any other marker), and (iii) posterior expected gain from marker-aided selection using the marker, resulting from all QTL linked to the marker, respectively. The need for model averaging is demonstrated by estimation of the selection bias that results when the effects of markers and the effects of allelic substitution are estimated conditional on selection. Further discussion of selection bias is given below and a small simulation study demonstrating the effect of model averaging on selection bias is given in the appendix.
In this article approximate posterior probabilities for models are obtained using a modification of the easily calculated BIC (Schwartz 1978). The probabilities are approximate but good enough to give a rough indication. Moreover, it may be possible, by adjusting a single parameter, to fine tune the method. We apply the method to QTL mapping data for wood density in Pinus radiata and demonstrate the need to consider multiple models in assessing the probability of existence of a QTL and obtaining estimates of the effects of allelic substitution at a marker free of selection bias.
DATA AND METHODS
The method is demonstrated on analysis of associations between wood density and markers from two linkage groups in P. radiata.
Marker and trait data were obtained from a single full-sib family with parents 850.55 and 850.96, for the purpose of QTL detection.
Trait data: Two 5-mm pith-to-bark cores were taken from each tree. Usable data were available from 93 trees. Wood density was assessed from each of the cores by X-ray densitometry (Cown and Clement 1983). The traits considered here were juvenile wood density at ages 1–5 years (estimated as the area weighted average of rings 1–5) as the average outerwood density (estimated as the average density of the outer 5 cm from each core), adjusted for site and replicate differences, and standardized. These traits are similar to the traits WD1_5, WD14 (outerwood density, not standardized) analyzed by Kumar et al. (2000).
Marker data: There were 126 markers of various types [randomly amplified polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP), and simple sequence repeat (SSR)] in 23 linkage groups with from 2 to 16 markers per linkage group, which were segregating in pseudobackcross configuration, for the 850.55 parent. There were 1171 missing marker values (10% of the marker data). These values were due to indistinct bands or PCR failure and are assumed missing at random. All 93 trees had one or more missing marker values. For further information on the study see Kumar et al. (2000).
with markers in pseudobackcross configuration, the analysis is equivalent to the analysis of a backcross, and
the method of this article, in particular multiple imputation for missing markers, does not apply to selectively genotyped data, where individuals genotyped are selected from the tails of the trait distribution.
Two linkage groups (linkage groups 1 and 3) were chosen for illustration of our method. Linkage group 3 was chosen because it contained statistically significant associations found from previous (non-Bayesian) analyses (Kumaret al. 2000). Linkage group 1 was chosen as simply another linkage group where no significant association had previously been found.
A non-Bayesian analysis is given for comparison. This is based on t-tests for the regression coefficient of a marker in a single-marker model for the trait and repeated for each available marker. Missing data were removed (rather than using multiple imputation) before testing each marker. Genome-wise thresholds for the t-statistics were estimated using a permutation test (cf. Churchill and Doerge 1994).
Bayesian information criterion: Approximate probabilities for models can be obtained from the BIC, given by (1) where k is the number of parameters fitted in the model, n is the number of observations, and R2 is the proportion of variance explained by the model. Schwartz (1978) showed that with no prior (all models a priori equally likely), the posterior probability (p) of a model is approximately proportional to (2)
This approach has the advantage of relative simplicity and consequent ease of computation. It has the disadvantage that the relationship (2) between BIC and probabilities of models is asymptotic; i.e., it relies on large sample sizes. To allow for this Broman (1997) advocates a modification, BIC-δ, (3) where δ is a constant. Broman recommends δ = 2 or δ = 3; however, the best value to use depends on sample size and other factors and is a topic for future research.
To handle missing values, multiple imputation (Rubin and Schenker 1986) was used to generate multiple instances of the data sets with missing markers randomly estimated according to the values and proximity of flanking markers. This is an alternative to the Haley and Knott (1992) method of assigning a weighted average of flanking marker values to the missing markers. The multiple imputation approach allows for uncertainty in the imputed values.
Ten imputations were used. Rubin and Schenker report good results with 2–3 imputations for estimating means and confidence intervals for continuous random variables. However, since we apply imputation to marker values that take on discrete values (0 or 1) we may need more imputations. As a check on the effectiveness of multiple imputations the model-averaged effects of allelic substitution for outerwood density at markers RAPD.192 and A47.c were reestimated for each of the 10 imputations separately.
Rather than repeating model fits for each imputation, the data from each imputation were combined, giving a data set with n × nI points, where nI denotes the number of imputations used. The multiple imputation estimate of BIC is given by applying (1), with n being the number of observations in the original (unimputed) data and R2 the value of R2 from the model fitted to the combined data set. This can be justified by considering the likelihood function for an analysis with each of the nI imputations of an observation given weight 1/nI; i.e., the likelihood contribution for all imputations of a data point is the same as the likelihood contribution for the data point in the unimputed data set, if there is no missing marker, and the average of the likelihood contributions from the various imputations if there is a missing marker.
Calculations for this work used Splus version 3.4 for Unix (Beckeret al. 1988). The Splus function bicreg.qtl, a modification of the function bicreg (Raftery 1995), was used to search through possible models and calculate the BIC criterion and associated quantities. The search procedure used by bicreg is essentially an exhaustive search using the all subsets regression function leaps, returning the value of R2, for each model, from which BIC is calculated. Backward elimination is used to reduce the number of variables to the limit of 30 prior to calling leaps. Our function contains modifications to allow for adjustment for multiple imputations, prior distributions, and the parameter δ. The function bicreg gives estimates of average effects for variable conditional on selection (i.e., averaged over models in which the variable is selected). We also give unconditional estimates, where the effect of a variable (marker effect) is defined to be zero in models where the marker is not selected, and model-averaged effects of allelic substitution (i.e., estimates of the difference between marker classes). Several options from bicreg can be adjusted to control the amount of computing done. These are Occam's razor constant OR and the lower limit on number of models nbest considered for each model size. Initially at least nbest models of each size are considered, then models less likely than the most likely model by the factor OR or more are eliminated from consideration. The calculations for Tables 2, 3 and 4 all used OR = 10,000 and nbest = 100.
Incorporating prior probabilities: The BIC criterion does not incorporate prior information. For any given prior, the effect of the prior will be negligible for a sufficiently large sample size. For BIC-δ increasing δ may compensate for a lower prior probability of linkage. We prefer, however, to explicitly incorporate the prior and leave the parameter δ to compensate for the effects of finite sample size on the asymptotic approximation involved in (2).
We now modify the probabilities of (2) by considering the prior probability for a QTL to be present at a given locus. If there are k QTL present, and markers are spaced on average at s cM in a genome of length L cM, then the probability that a QTL is present in the immediate neighborhood of a marker is (4)
Our markers are spaced at ~12–15 cM on a genome of estimated length 2000 cM (Wilcox 1997). A genetic architecture with many small QTL seems likely since large effect QTL should have been detected by previous studies (Wilcoxet al. 1997). If there are 20 QTL we have k = 20, s = 12, L = 2000 cM, giving π ≈ 0.1. To show sensitivity to π, and to accommodate readers who believe there are fewer QTL likely to be present, we also calculate posterior probabilities for model size for π = 0.03 or 0.06 corresponding to 5 or 10 QTL, respectively, in the analysis below.
Ideally one would like to have a marker at each QTL location. What we can guarantee is a marker at each QTL location to within the resolution of the marker linkage map. So for each QTL configuration, the “true” model (our best approximation) will be a model such that each marker is selected (in the model) if and only if it is the closest marker to some QTL or, equivalently, there is a QTL in the region around a marker extending halfway to each of the adjacent flanking markers. QTL are assumed to occur randomly in any s-cM interval, with occurrences in various intervals being mutually independent. So the prior probability that each marker is selected is π, and the events of selection or nonselection of the various markers are a priori mutually independent. The prior probability that a given model with k markers selected is the true model is therefore (5) The combined prior probability for all models with k markers linked to QTL is therefore the binomial probability (6) where q is the total number or markers being considered.
Marginal probabilities of QTL location: The marginal probability that a QTL is in a region is estimated as the marginal probability that one or more of the markers in the region are selected, which is obtained from the BIC calculation by summing posterior probabilities for models containing one or more of the markers.
Models, effects of markers, and effects of allelic substitution: Let m denote the number of markers and n the number of progeny. We consider all possible models, corresponding to all 2m possible subsets of markers. Each model is characterized by a set of markers that are selected. Let denote the kth model, let pk denote the posterior probability of , let denote the model where only marker i is selected, and let denote the set of models with marker i selected.
Let Mj be the jth marker, with alleles labeled 1 and 2, and yi and Mj(i) the trait value and value of the jth marker for the ith individual, respectively. Let V(Mj) (the vicinity of marker j), be defined as the set of points closer to marker j than to any other marker.
The models fitted are of the form (7) where (8) and the errors eij are assumed to be normally distributed.
Effects of markers: The regression coefficients for are denoted by bj,k or simply bj when there is no need to distinguish models. We refer to bj,k, as the effect of the jth marker in . The coefficients bj,k, for unselected markers are set to zero by convention, so that the sum in (7) is effectively over selected markers.
Effects of allelic substitution: The effect of allelic substitution, di,k, for Mi in is defined as the difference in population means between the two marker classes if is the true model.
the effect of allelic substitution di,k(i), in the model where only marker i is selected and is the same as the conventional effect of allelic substitution, and
the effects of allelic substitution di,k are not the same as the effects of markers bi,k, except in the model , in which case and the observed difference between the averages of marker classes is an unbiased estimate of both quantities.
Estimation: Let , be the conventional maximum-likelihood estimates of estimates of bi,k, di,k, respectively, in model .
Let be the estimated effect for the ith marker conditional on selection (i.e., the effect, averaged over models, in which the marker is selected) given by (9) and let , be the unconditional estimate (averaged over all models with the effect set to zero in models where the marker is not selected) of the marker effect for marker i given by (10)
Similarly, let be the model-averaged estimate of allelic substitution for the ith marker given by (11)
Selection bias: Selection bias is a well-known phenomenon that occurs when using a model selection method, such as stepwise regression, to select the variables in a model, and the same data used for model selection are used to estimate the effects. Conditional on selection, the estimates of effects are greater in absolute value than the true values, because in the sampling distribution of effects, the values larger than some threshold are selected. An implicit model selection step is being carried out when selecting markers using the conventional t-test or interval mapping methods.
Our estimate of selection bias is obtained by comparing the model-averaged estimates, which we argue have no problem with selection bias (cf. appendix), to corresponding estimates conditional on selection, i.e., estimates averaged over models in which the marker is selected.
Selection bias in the effect of allelic substitution is estimated as (12) and is given to indicate the bias likely to result from commonly used methods that both select a marker or markers on the basis of some test (effectively selecting a model) and estimate the effect of the marker using the same data. The actual selection bias when using the non-Bayesian methods depends on the threshold used for the test statistic (it is higher with higher threshold or lower P value).
The concepts of this selection and their interpretations are summarized in Table 1. Further discussion of selection bias and a small simulation study are given in the appendix, showing the selection bias in the t-test method, that Bayesian estimates have negative bias (shrinkage toward zero), in the case of model uncertainty, and that even conditional on selection using the t-test at quite low thresholds, the model-averaged estimates did not show selection bias.
Results are given for δ = 1, 1.5, 2, and 3. For simplicity of discussion, unless otherwise stated, all comments below refer to the case δ = 1. Higher values are more conservative, leading to lower probabilities for a QTL on a linkage group, larger amounts of selection bias, and smaller estimates of effects of allelic substitution.
Table 2 shows the marginal probabilities for models of various sizes for two linkage groups, linkage groups 1 and 3. These probabilities are obtained by amalgamating probabilities of all models of each given size.
The marginal probability that one or more QTL are present on a linkage group is the probability that the model size is 1 or more or 1 minus the probability that the model size is zero.
For juvenile wood, from Table 2 with δ = 1, the probability that a QTL is present on linkage groups 1, 3 is 0.17, 0.997, respectively. On linkage group 3 the probability of model size 2 was 0.26, indicating the possibility of two QTL separated by one or more markers. Higher values of δ are more conservative, giving lower probabilities for a QTL; e.g., with δ = 2, 3 the probability that a QTL is present on linkage group 3 is 0.97, 0.76, respectively. Evidence for a QTL, albeit not strong, persists to δ = 3.
For outerwood, from Table 2 with δ = 1, the probability that a QTL is present on linkage groups 1, 3 is 0.38, 0.88, respectively. On linkage group 3 the probability of model size 2 was 0.28, indicating the possibility of two QTL separated by one or more markers. With δ≥ 2 the probability that a QTL is present is <0.5 on each linkage group.
Marginal probabilities for model size for various values of the prior probability π = 0.03, 0.06, 0.1, corresponding to a prior expectation of 5, 10, 20 QTL, respectively, are shown in Table 3.
Table 4 shows the probability of selection, estimated effects, and standard errors for markers, obtained by combining effects across models according to their probabilities. Effects ( ) are shown conditional on selection (averaged over models, in which the marker is selected, corresponding to estimated QTL effects assuming a QTL is present) or unconditionally ( , averaged over all models with the effect set to zero in models where the marker is not selected, corresponding to the posterior mean of estimated QTL effects for QTL in the vicinity of marker i).
Note that no single marker has a high posterior probability. This reflects uncertainty in the positional location of a QTL. For example, for juvenile wood density the markers RAPD.38 to A297.b2 had probabilities of 12–42%. Outside this region posterior probabilities dropped off to low values. This suggests that a QTL if present is most likely to be in the region between these two markers or possibly in the closer one-half of the adjoining intervals beyond this region. The marginal probability that a QTL is in this region is estimated at 0.995 and contains practically all of the posterior probability of models of nonzero size.
Table 5 gives the conventional t-test/LOD score analysis one marker at a time for linkage group 3 plus model-averaged estimates of the effect of allelic substitution and estimates of selection bias in the non-model-averaged estimates of allelic substitution. The conventional estimate of allelic substitution for the ith marker is subject to selection bias—it was obtained from the same data that were used to select the marker. The model averaged estimate in Table 5 is not subject to selection bias.
For juvenile wood density, the markers RAPD.38 and A113.a1 had comparison-wise P values of ~0.00001 (genome-wise P < 0.01) and markers A329.c3 and A297.b2 had comparison-wise P values of ~0.0001 (genome-wise P < 0.05).
For outerwood density, RAPD.192 and A47.c have comparison-wise P values of ~0.002.
Note the selection bias of 27, 25, and 75% for the three largest effect markers for juvenile wood density and 85 and 77% for the two largest effect markers. If higher values of δ are used these estimates will increase.
The calculations for outerwood density for linkage group 3 with 16 markers took ~9 min in Splus on a Silicon Graphics Indigo Impact 10,000 running Irix 6.2, finding probabilities for a total of 332 models.
Calculations of just the model probabilities in Table 2 with δ = 1 and nbest = 10 took 6 sec.
The effects of allelic substitution for outerwood density at markers RAPD.192 and A47.c were reestimated for each of the 10 imputations separately. The standard deviation between imputations was 0.075, giving a standard error of the mean for 10 imputations of ~0.02, which is acceptable.
The BIC method with multiple imputations for missing values gives estimates of posterior probabilities that can be easily and rapidly calculated for a linkage group. With 10 imputations the standard error of the mean of the effects of allelic substitution estimated separately for each imputation was only about one-eighth of the standard error of the non-model-averaged estimate of the effect of allelic substitution. Therefore, more imputations would not significantly decrease the error of the model-averaged estimate.
For juvenile wood density the posterior probability that there is a QTL on linkage group 1 is ~0.17. The posterior probability that there is a QTL is high for linkage group 3 with a posterior probability >0.95 for δ≤ 2.
The putative QTL for juvenile wood density on linkage group 3 was (or were) located in the region between or adjoining the markers RAPD.38 and A297.b2 with probability 0.995. By comparison, Kumar et al. (2000), applying the bootstrap method of Visscher et al. (1996), obtained a 95% confidence interval of 56–96 cM, the region from approximately midway between RAPD.59 and RAPD.38 to just before marker A297.b2. This is comparable to our result, although a 95% confidence interval is not the same as a 95% posterior interval, and the posterior probabilities decrease rapidly as one moves beyond this interval so that a 95% probability interval is not much different in size to a 99% or higher probability interval.
For outerwood density, the posterior probability that there is a QTL on linkage group 1 is ~0.38 and on linkage group 3 the probability is ~0.88. This is evidence against a QTL on linkage group 1 and evidence for a QTL on linkage group 3. The evidence is not strong, however, so we should not be surprised, if, as was the case, the QTL association was not subsequently verified on linkage group 3. Nor does the evidence rule out the existence of a small undetected QTL on linkage group 1. A larger sample size is recommended.
Although no single marker for outerwood density attained the experiment-wise level of P < 0.05, Kumar et al. (2000) obtained a P value of 0.002 (experiment-wise P < 0.05) for the F-test when seven equally spaced markers from linkage group 3 were jointly regressed on outerwood density. Their experiment-wise P value of just <0.05 can be compared to our marginal probability for model size greater than zero of 0.88 for linkage group 3 with δ = 1. The experiment-wise P value is about one-half the posterior probability of model size zero in this case. If δ = 2 or δ = 1 and π = 0.03 corresponding to a prior expectation of only five QTL then the experiment-wise P value is about one-third or one-eighth of the posterior probability of model size zero.
These results demonstrate the difference between the results of the Bayesian approach to QTL mapping and the non-Bayesian or frequentist approaches, where, to the naive user, the evidence in the form of P values appears stronger. If using P values, controlling for multiple comparisons is certainly necessary in this case—the comparison-wise P values were orders of magnitude less than the probability of model size zero. The experiment-wise P values were somewhat less than but of the same order of magnitude as the probabilities of model size zero in this example.
As has been demonstrated in other applications (see, e.g., Berger and Berry 1988), the evidence for a QTL can be weaker than appears to be the case with P values: A genome-wise significance level of α = 0.05 (comparison-wise 4.4 × 10−4) can correspond to only weak evidence for a real effect. This is to be expected because the strength of evidence implied by a given P value decreases with sample size. This occurs because the P value is measuring evidence that H0 is not the true model under which the data are generated. In practice, any model is only an approximation for the process generating the observed data, and hence as the sample size gets large the probability of observing the data under H0 tends to zero. The problem is that the probability of observing the data under the alternative hypothesis H1 of a real effect also tends to zero (for the same reason) and may be equally small; i.e., the data do not favor H1 over H0 just because the P value is small. Therefore, with larger sample sizes the differences between the P value and the posterior probability of H0 are likely to increase.
The problem remains with approaches using P values, whether comparison-wise, chromosome-wise, or experiment-wise: what threshold to use and how to interpret the results. Is the evidence strong, fair, or weak if we get an experiment-wise P value or 0.05 or 0.01? There is no relation between the experiment-wise P value and posterior probabilities that is independent of sample size and problem setup.
The 2 linkage groups analyzed here were preselected from 23 linkage groups. For the Bayesian approaches this poses no problem—the results of a Bayesian analysis depend only on the data analyzed and not other information such as what other data had been, or might have been, or will be analyzed. This avoids complexities such as whether to use comparison-wise, genome-wise, or experiment-wise thresholds for QTL detection, and the difficulties of interpretation and use of any of these quantities for decisions.
Using a high threshold reduces the number of false positives but also reduces the probability of detection of real QTL. To be “wrong” only 5% of the time when there is no effect may be comforting for an experimenter, but this is no comfort to decision makers who may be presented with only the 5% of “significant” results, which could well be all wrong. Decision makers need to know the posterior probability of presence of a QTL in a region vs. the cost of using more markers or carrying out further, or larger, QTL mapping experiments. Decision makers also need unbiased estimates of effects of allelic substitution at a marker putatively associated to a QTL. Our best estimate, given the data and prior knowledge, is one-half the model-averaged effect of allelic substitution. This is unbiased in the sense that the model-averaged effect is the posterior mean for the effect. To obtain unbiased estimates with the non-Bayesian QTL mapping methods requires separate data for selection (QTL “detection”) and estimation (or “verification”). For these reasons we recommend readers adopt the Bayesian approach.
There are two major differences between our approach and non-Bayesian methods—considering the prior probability for a QTL to be present and the use of multiple models. These are important to avoid exaggerating the evidence for a QTL, for estimating the gain from marker-aided selection, and for avoiding the problems with selection bias (Miller 1990). Selection bias is shown to be a problem that cannot be ignored for the data of this article. The method of this article explains, and asymptotically (to within the accuracy of the estimates of probabilities based on BIC) overcomes, the problems of selection bias and QTL being frequently detected but not verified (see, e.g., Beavis 1994; Wilcoxet al. 1997; Melchingeret al. 1998).
We compared the proposed Bayesian approach with standard non-Bayesian QTL mapping methods, as commonly used. In the context of non-Bayesian QTL mapping, other techniques have been suggested such as cross-validation (Utzet al. 2000) and bootstrapping (Beavis 1994; Visscheret al. 1996), which could potentially be used to ameliorate problems of selection bias.
Cross-validation is a technique where the analysis is repeated with various disjoint subsets of the data left out and results combined. Cross-validation is generally used, in the context of a single model, to obtain an estimate of prediction error. Utz et al. (2000) point out that estimates of QTL effects are often inflated (we suggest mainly because of selection bias). They use cross-validation to obtain unbiased estimates of the magnitude of QTL effects. However, they had already eliminated selection bias prior to cross-validation, by using “test data sets” for the cross-validation separate from their “estimation data” that were used to select the markers. A common problem with cross-validation is the inaccuracy in cross-validation estimates of error. To use cross-validation with a method such as interval mapping, which selects models, the cross-validation subsets would have to be chosen to be sufficiently large to result in a range of different models being selected. Therefore the overall sample size would have to be large to give a reasonable number of cross validations. Bootstrap bagging (Breiman 1996) or boosting (Freund 1995; Freund and Schapire 1996) may be better. See, e.g., Dudoit et al. (2000) for an application to microarray data analysis. These methods should be more robust than single-model methods and may give acceptable results for some purposes—if one is interested solely in a “black box” type of model for prediction only and not inferences about individual loci. However, they will be effective in reducing selection bias only to the extent that they effectively average over a set of possible models approximately proportional to their probabilities. Note that bootstrapping (Efron 1982) was invented for use as a general method; however, considerable sophistication is needed to rigorously justify applications of the bootstrap to other than the simplest setups. For a discussion see Young (1994).
In the Bayesian context an alternative approach to analyzing and averaging over multiple models according to their probabilities is to fit one large model with all possible predictors, with the appropriate prior correlation structure. Determining the “appropriate” correlation structure is the difficulty with this approach. A generic method that effectively does this is ridge regression, where predictors are shrunk toward zero, predictors with less support from the data being shrunk more. Ridge regression has a Bayesian interpretation (Frank and Friedman 1993) corresponding to a uniform prior distribution on directions in the vector space spanned by predictors. The ridge parameter controlling the overall shrinkage can be chosen by cross-validation (see, e.g., Ballet al. 1998, for an example relating chemical analysis to sensory perception).
We prefer the approach of this article over the above alternatives on philosophical grounds because our prior relates more naturally to prior expectations about the number of QTL than the Bayesian alternative and follows logically from the natural prior using probability theory, i.e., does not involve the ad hoc-ery of the frequentist alternatives. Assuming our prior distributions are a reasonable representation of our prior knowledge, Bayesian theory guarantees the full Bayesian approach is optimal.
The probabilities calculated in this article depend on the values of the parameter δ and the value of the prior probability π = 0.1. Higher values of δ correspond to lower probabilities for presence of a QTL and higher selection bias. The adjustment factor δ was proposed by Broman (1997) to correct for the finite sample size, which may not be large enough to rely on the asymptotic approximation in (2). We expect the appropriate value to use depends on sample size, with δ = 1 being the appropriate choice for large sample size, δ = 2 being fairly conservative. Broman recommends δ = 2 or δ = 3. However, since we, unlike Broman, adjust for the prior, we may not need as high values for δ as Broman recommends.
The appropriate value(s) of δ could be determined by comparison with simulation consistent estimates for the hierarchical model obtained from a Markov chain Monte Carlo (MCMC) method. Another possible approach is to compare results with analytical calculations of Bayes factors (cf. Smith and Kohn 1996) for one or more single-marker models compared with the null model, with a suitable prior on the size of marker effects.
The reader can and should try other values of π. The marginal probabilities for model sizes in Table 2 can be adjusted for different values of π using (6). More generally a continuous prior distribution for π, rather than a single value as we have used here, can be approximated by combining results from several different values of π.
There are several options available to cut down on computing time:
Reduce the number of markers analyzed. For example, if seven markers spaced at ~20 cM are analyzed the time is reduced to 1 min.
Cut down on the number of models to be found for each model size (option nbest of bicreg.qtl) and/or limit the largest size of models considered.
Reduce the Occam's razor threshold (option OR of bicreg). Models less likely than the most likely model by the factor OR or more are eliminated from consideration.
Program the method in a lower level language such as C.
The method of this article can be applied one chromosome or linkage group at a time, with a substantial reduction in computation. Since linkage groups are independent, this makes little difference to inference and reduces the number of models that need to be considered to a reasonable number. If enough QTL have been found (by an initial iteration of the method) to significantly reduce the error variance, then it will be advantageous to add covariates to control for the QTL from other linkage groups to the models for the linkage group being considered. The covariates would always be selected: i.e., models being compared would consist of covariates plus selected subsets of markers in the linkage group being considered.
The marker density required depends on the sample size and the accuracy with which it is desired to estimate QTL location. The latter is, however, limited by sample size. A dense marker map is not required. We would not recommend having much more than about five markers within a region in which a QTL could be located with probability 95%.
A future direction for development of QTL mapping methods based on model selection is to allow for finer structure mapping, using grids of points between markers. Marker values at these grids would be regarded as missing data, which, if known, would allow the application of the method. Missing marker values could be estimated using multiple imputation or (preferably) as additional parameters in an MCMC algorithm. Our current marker density is adequate, given the sample size and consequent uncertainty in QTL location.
For outbred crosses, considering one parent at a time, markers are either segregating in pseudobackcross configuration (as analyzed in this article) for the parent or not segregating, in which case they contribute no information. The method as described here will find additive effects of loci in each parent separately. The analysis may be extended to the general case by combining loci from the two parents and allowing for interactions (corresponding to dominance and epistasis) between loci.
The author thanks Phillip Wilcox and Rowland Burdon for valuable feedback on various versions of the manuscript from the perspective of molecular and quantitative geneticists; Mark Kimberley and the referees for useful comments on the manuscript; John Lee for the phenotypic data; and Geoff Corbett and Paul Fisher for the genotypic data. Comments from Phillip Wilcox, Gary Churchill, and a referee have resulted in a substantially expanded discussion and justification of the results on selection bias. This work was funded by the New Zealand Foundation for Research Science and Technology.
APPENDIX: SELECTION BIAS AND MODEL AVERAGING
We explain why there is no problem with selection bias in model-averaged estimates of allelic substitution, derive an approximate relationship between selection bias and uncertainty in QTL location, then give the results of a small simulation study comparing selection bias from the t-test method using several thresholds to results from model averaging, which are given both unconditionally and conditional on selection for each threshold.
Discussion of selection bias and model averaging: The model-averaged estimate of the effect of allelic substitution di for marker i is the same as the expectation di under the posterior distribution of the overarching hierarchical model, as can be seen by a straightforward calculation, (A1) (A2) where are the various models, pk their posterior probabilities, θk and fk(·) denote the parameters and posterior density function of model , di(θk) is the value of di as a function of θk, and f denotes the combined posterior distribution, whose density is given by the second term in (A1).
Hence, the model-averaged estimate is the mean of the marginal posterior distribution for di,av. The marginal posterior distribution represents our knowledge of di,av, and according to standard Bayesian theory, optimal decisions are obtained by minimizing the expected loss (or maximizing the expected gain) over this distribution. Thus the model-averaged estimate is our best estimate, to which we compare estimates conditional on selection.
Note: This argument applies to any quantity that can be calculated as a function of the parameters in any model and has a well-defined interpretation.
Relationship between selection bias and uncertainty in QTL location: To better understand selection bias in the context of hypothesis test approaches to QTL mapping, we consider what happens to the estimates of allelic substitution at marker i under various single-marker models and the null model and derive an approximate relationship between selection bias in the effect of allelic substitution at a marker and uncertainty in the location of a QTL.
Let denote the single-marker model with marker i selected, be the null model with no marker selected, rij be the recombination distance between markers i and j, and δr be the average intermarker spacing.
If the true model is known to be then the estimate of the effect of allelic substitution di under is the standard least-squares estimate, which is unbiased.
Suppose the true model is but has been selected [in preference to and other markers] by a single-marker hypothesis testing procedure. Then the estimate of di under is greater than the estimate of dj under . The latter estimate is unbiased so is likely to be comparable to or greater than dj.
Since is the true model, dj is the QTL effect [up to a factor of (1 −δr/2)].
Then the true effect of allelic substitution at marker i is Thus selection bias in is ~1/(1 − 2rij). Furthermore, assuming model we estimate
Thus assuming when is the true model induces a relative bias of the order of (1 − 2rij)2 in the ratio of
If the model selected is the null model with no QTL, the estimated effect is zero.
Selection bias and model averaging—a simulation study: For each of and simulated data sets were generated each with 100 backcross progeny, with 0 (probability 0.53) or 1 QTL explaining proportion of total variance, randomly placed on a chromosome of length 120 cM with six markers evenly spaced at positions 10, 30, 50, 70, 90, and 110 cM. The size of the QTL effect if present always had a positive sign, corresponding to a QTL effect of a = 0.45 (or a = 0.89 in units of 1 phenotypic standard deviation). The analysis used model averaging with BIC-δ, with δ = 1 and prior probability π = 0.1 for a marker to be selected.
Table A1 gives bias in units of 1 phenotypic standard deviation for the single-marker t-test and model averaging using BIC.
The fourth row of Table A1 is the average “bias” using model averaging with BIC-1 (δ = 1). A negative bias (shrinkage toward zero) is expected with the Bayesian method if there is any model uncertainty.
The fifth row of Table A1 is the average bias using model averaging conditional on markers being selected by the t-test method (not something we would necessarily advocate). Interestingly, while the bias for the t-test method as used increases with selection intensity, as expected, the bias for BIC actually decreases (increases in magnitude) with selection intensity; i.e., model averaging is more than compensating for selection of significant markers only. This may be an artifact of the fact that as the threshold increases, the simulated QTL, if selected, is more likely to be at or near the marker under consideration, so the effect is larger; hence the potential shrinkage toward zero is larger.
Selection bias and model averaging—summary:
Selection bias occurs when the same data are used to select a regression model and to estimate the coefficients (here marker effects) in the model.
Selection bias occurs because expected values of estimates of the effect of allelic substitution at a marker are always greater under the model with that marker selected than under any other single-marker model or the null model.
Estimates of allelic substitution at a marker are unbiased, if the true model is known, or selected with independent data.
Selection bias is not a problem if Bayesian model averaging is used. The model-averaged Bayesian estimates are not unbiased under any assumed QTL configuration but are the average of unbiased estimates under various models, averaged according to the posterior probability that each model is the true model. From observation (2) it follows that model-averaging estimates of QTL effects are shrunk toward zero. The amount of shrinkage reduces with the precision with which the QTL location can be determined.
The Bayesian method can be applied even if a previous hypothesis test or tests have selected a chromosome or region (as was the case for linkage group 3 in this study). This is because the posterior probability distribution for that chromosome or region depends only on the data through the likelihood function and the prior probability for that chromosome or region.
Model averaging can overcome selection bias even if markers are selected using non-Bayesian tests.
Communicating editor: G. A. Churchill
- Received April 13, 2000.
- Accepted August 30, 2001.
- Copyright © 2001 by the Genetics Society of America