Genetics, Vol. 160, 1673-1686, April 2002, Copyright © 2002

Improved Confidence Intervals in Quantitative Trait Loci Mapping by Permutation Bootstrapping

Jörn Bennewitza, Norbert Reinscha, and Ernst Kalma
a Institut für Tierzucht und Tierhaltung, Christian-Albrechts-Universität, D-24098 Kiel, Germany

Corresponding author: Ernst Kalm, Hermann-Rodewald-Str. 6, D-24118 Kiel, Germany., ekalm{at}tierzucht.uni-kiel.de (E-mail)

Communicating editor: C. HALEY


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

The nonparametric bootstrap approach is known to be suitable for calculating central confidence intervals for the locations of quantitative trait loci (QTL). However, the distribution of the bootstrap QTL position estimates along the chromosome is peaked at the positions of the markers and is not tailed equally. This results in conservativeness and large width of the confidence intervals. In this study three modified methods are proposed to calculate nonparametric bootstrap confidence intervals for QTL locations, which compute noncentral confidence intervals (uncorrected method I), correct for the impact of the markers (weighted method I), or both (weighted method II). Noncentral confidence intervals were computed with an analog of the highest posterior density method. The correction for the markers is based on the distribution of QTL estimates along the chromosome when the QTL is not linked with any marker, and it can be obtained with a permutation approach. In a simulation study the three methods were compared with the original bootstrap method. The results showed that it is useful, first, to compute noncentral confidence intervals and, second, to correct the bootstrap distribution of the QTL estimates for the impact of the markers. The weighted method II, combining these two properties, produced the shortest and less biased confidence intervals in a large number of simulated configurations.


SEVERAL statistical methods were proposed for detecting quantitative trait loci (QTL) using marker information. Most of them are based either on maximum-likelihood procedures (LANDER and BOTSTEIN 1989 Down) or on regression methods (HALEY and KNOTT 1992 Down; MARTINEZ and CURNOW 1992 Down). However, none of these methods led to a straightforward calculation of a confidence interval. These are intervals on the chromosomes that contain the QTL with a given coverage probability P (usually 90 or 95%). Calculating small confidence intervals that hold the stated coverage probability is of fundamental importance for practical breeding purposes as well as for molecular biologists. For example, breeders who wish to use the marker information in marker-assisted selection (MAS) breeding schemes need to know the optimum length of the chromosome segment of interest, and biologists need this knowledge for any further decisions regarding experimental fine-mapping strategy.

Different methods were described to calculate confidence intervals in QTL mapping. CONNEALLY et al. 1985 Down and LANDER and BOTSTEIN 1989 Down proposed the LOD drop-off method. This is based on the conversion of the distribution of the likelihood-ratio test statistic of linkage vs. no linkage between a marker and a putative QTL into a chi-square distribution. Using this method, the endpoints of the confidence interval are calculated by finding the locations at either side of the estimated QTL position having the maximum LOD score that have a LOD score of 1 unit less than the estimated QTL position. The confidence interval is then defined by all those points on the chromosome where the LOD score has fallen within 1 unit from the maximum (i.e., the positions between the two interval endpoints). However, this method often produces intervals that are too short; i.e., the intervals contain the QTL with a lower coverage probability than stated. VAN OOIJEN 1992 Down and MANGIN et al. 1994 Down showed this to be the case for small and medium-sized populations and for QTL of small effects because the regular conditions for the conversion of the likelihood-ratio test toward a chi-square distribution do not hold in these genetic configurations. To attain a higher probability of containing the QTL, VAN OOIJEN 1992 Down suggested that the LOD drop-off confidence intervals should be based on two LOD differences. In a simulation study it was found that the sizes of these confidence intervals were large and variable (VAN OOIJEN 1992 Down) and, in population sizes used in real QTL mapping procedures, unsatisfactory for QTL with smaller effects.

MANGIN et al. 1994 Down and MANGIN and GOFFINET 1997 Down developed complex formulas for calculating confidence intervals. These are based on a maximum-likelihood-ratio test using statistics in which asymptotic distribution does not depend on the effect of the QTL. The major difficulty of this method is the computation of the correct threshold for the likelihood-ratio test for which MANGIN and GOFFINET 1997 Down proposed an approximate analytical solution.

VISSCHER et al. 1996 Down introduced a nonparametric bootstrap resampling approach (EFRON 1979 Down; EFRON and TIBSHIRANI 1993 Down) to construct a confidence interval for a QTL position. The evaluation of generated bootstrap samples provided a distribution of QTL position estimates along the chromosome that forms the base for the confidence interval calculation. Within a simulation study they compared the nonparametric bootstrap method with the LOD drop-off method. The nonparametric bootstrap procedure seemed a good alternative to the LOD drop-off for defining confidence intervals for QTL positions, but tended to be unsuitably large when the QTL effect was small.

To reduce the width of the estimated bootstrap confidence interval a selective nonparametric bootstrap method was proposed by LEBRETON and VISSCHER 1998 Down. Their selection was based on criteria related to the assumed genetic model. By comparing both methods in a simulation study they found that the selective bootstrap method produced smaller confidence intervals with a reduced conservativeness. Moreover, they showed that there is scope for improvement of the classical bootstrap approach for calculating confidence intervals as proposed by VISSCHER et al. 1996 Down.

WALLING et al. 1998 Down provided a parametric bootstrap resimulation method (EFRON 1982 Down) as an alternative to the nonparametric bootstrap approach and tested this method in a simulation study. As the authors themselves pointed out, the parametric bootstrap method produced poor results: The derived confidence intervals varied from conservative intervals when the QTL position was at the location of the marker to anticonservative intervals when the QTL position was between two markers.

A main result from the study of WALLING et al. 1998 Down was the discovery of a bias in calculating confidence intervals from the distribution of QTL position estimates from nonparametric bootstrap samples. This bias is defined as a significant difference between the observed and the stated coverage probability and is deemed to have been caused by the locations of the markers. Eliminating this bias would probably result in smaller confidence intervals. As mentioned by LEBRETON and VISSCHER 1998 Down, an additional reduction of the interval width could be possible by computing noncentral confidence intervals. Taking this into consideration, we present three modified nonparametric bootstrap schemes to calculate confidence intervals: (i) noncentral, (ii) marker corrected, and (iii) noncentral and marker corrected. Within a simulation study these schemes were compared with the nonparametric bootstrap method presented by VISSCHER et al. 1996 Down.


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Simulation:
A first generation backcross population consisting of one half-sib family and derived from two inbred lines was generated and contained 300 or 500 (N = 300, 500) progeny. For each individual a single 100-cM chromosome evenly spaced with markers every 20 cM was simulated. For calculating the crossover probabilities the Kosambi mapping function was used. Each chromosome contained a single segregating diallelic QTL with a position midway between two flanking markers , closer to one of the flanking markers , or directly at a marker position . The effect of the QTL was expressed in units of the polygenic standard deviation, and to cover a wide range of effects it was varied between 0.1, 0.5, and 1.0 . Additional simulations were carried out to investigate the impact of a longer chromosome ( with and d in the middle of the chromosome) and a higher marker density [{Delta} = 10 cM and 10/5 cM (additional markers at 35, 45, 55, and 65 cM), with , respectively]. To avoid any confounding effects due to different detection power (DARVASI et al. 1993 Down) the QTL position was located exactly midway between two flanking markers when marker spacing was reduced .

The total genetic variance was the sum of the polygenic variance and the variance due to the QTL. The polygenic genetic component was normally distributed. The heritability of the trait, defined as the polygenic variance divided by the sum of polygenic variance and the nongenetic variance, was assumed to be 0.25. Random nongenetic factors were normally distributed. For each simulated genetic configuration 1000 replicates were generated. The simulation program was written in FORTRAN 77, supplemented with routines from the NAG library (NUMERICAL ALGORITHMS GROUP 1990). QTL analyses were done by using the programs BIGMAP and ADRQLT (briefly described in REINSCH 1999 Down) and were based on the multimarker regression described by KNOTT et al. 1994 Down. Confidence intervals were calculated with FORTRAN 77 programs and SAS procedures (SAS INSTITUTE 1992).

Generating the bootstrap samples:
A total of 250 nonparametric bootstrap samples for each genetic configuration were generated as described by VISSCHER et al. 1996 Down. Briefly, for each single bootstrap sample N observations out of the pool of N original observations were drawn with replacement, so that each observation consisted of an individual's marker genotype and the corresponding phenotype. Some observations may appear more than one time in a bootstrap sample while others may be not included at all. Each single bootstrap sample was analyzed and the best estimate of the position of the QTL was recorded. After 250 bootstrap samples the distribution of the 250 estimates of the QTL position was derived by ordering them along the chromosome. Because the QTL is linked to two flanking markers, this distribution is termed the linkage distribution.

Modeling marker impact on bootstrap distributions:
As reported by WALLING et al. 1998 Down the reasons for the marker impact on the distribution of the bootstrap estimates for the QTL position are an accumulation of the type I error rate at the marker loci and the general trend of the mapping procedure to put the estimate of a QTL position toward a marker locus. In the following section a closer look at this marker impact is presented by a simple model, and the theoretical background for a marker correction is given. At first we focus on the expected distribution of the best estimates for a QTL position out of a bootstrap sample when a QTL may be present but not linked with any marker. This distribution is termed the null distribution, because every estimate indicates a type I error. In this null distribution the probability of each position achieving the best estimate would be

(1)

where P(i)nd is the probability of obtaining the best estimate for position i on the chromosome in the null distribution (nd), Pe is the probability for each position achieving the best estimate assuming equal probabilities for each position, and S(i) is a "selection coefficient" of position i in the null distribution. Because Pe is a constant with the same value for every position (i.e., one divided through the number of possible QTL positions), it can be stated

(2)

The selection coefficient describes the probability of each position to obtain the best estimate as a deviation from the equal distribution and is determined by the position and the informativeness of the marker loci. It would be highest at the marker loci and lower at the loci between the markers. In this model, it is assumed that the selection coefficient S(i) is the same for both the linkage distribution and the null distribution. The probability of each position in the linkage distribution [P(i)ld] to obtain the best estimate would be

(3)

where P(i)QTL is the probability of each position i achieving the best estimate when a QTL is linked with a marker and is, of course, influenced by the effect of the QTL but not by the markers per se. Following Equation 3, an approximate correction for the impact of the markers can be achieved by weighting the probability of each position from the linkage distribution with the reciprocal value of the corresponding selection coefficient:

(4)

The distribution resulting from the corrected probabilities [P(i)QTL] to attain the best estimate for the QTL location is termed the corrected distribution.

Weighted method I:
In the weighted method the marker correction approach was followed. First, the probability in the linkage distribution to attain the best estimate P(i)ld and the selection coefficient S(i) had to be determined for each position on the chromosome. The values for P(i)ld were the corresponding observed frequencies of best QTL estimates in the linkage distribution [f(i)ld]:

(5)

The linkage distribution was obtained as described above. The selection coefficients S(i) were calculated by computing the null distribution for each simulated genetic configuration. This was achieved by setting the recombination rate between the simulated QTL and every marker locus to 0.5. The null distribution was then built by the best estimates out of all bootstrap samples and all replicates (250 estimates x 1000 replicates) for each simulated genetic configuration. Following Equation 2 the selection coefficient for each position on the chromosome was the corresponding frequency of the best estimates in the null distribution [f(i)nd]:

(6)

For each replicate the distribution of the probabilities P(i)QTL (the corrected distribution) was calculated using (4).

The confidence intervals with a coverage probability P of 95 and 90% were calculated by using the top and the bottom 2.5th and 5th percentiles as the upper and lower interval endpoints, respectively. The obtained confidence intervals were central.

Weighted method II:
For the weighted method II the corrected distribution was computed in the same manner as for the weighted method I. Whereas for the weighted method I one-half of the given error rate of 1 - P was subtracted from each side of the corrected distribution during the computation of the confidence intervals, in the weighted method II an analog highest posterior density (HPD) method was used to obtain noncentral intervals when the corrected distribution is not tailed equally. This method was adapted from the standard HPD approach as described in BOX and TIAO 1992 Down and is presented in detail in the Appendix

Uncorrected method I:
This method was the original one as proposed by VISSCHER et al. 1996 Down and did not correct for the marker impact. The central confidence intervals were calculated directly from the linkage distribution by taking the top and bottom 2.5th and 5th percentiles of the linkage distribution as the upper and lower interval endpoints, respectively.

Uncorrected method II:
In this method confidence intervals were computed from the linkage distribution with the analog HPD method described in the Appendix The calculated intervals were noncentral and not marker corrected.

The confidence intervals with the stated coverage probability P of 95 and 90% were calculated for each replicate by the four methods described above, and it was assessed whether the QTL was located within these intervals or not. The rate of the confidence intervals for each simulated genetic configuration that did not include the simulated position of the QTL was termed the noninclusion rate (NI), and this should be not greater than 1 - P. For each configuration the average width of the confidence interval out of 1000 replicates was also calculated, and this is ideally small.


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

The initial results for all four bootstrap schemes are presented in Table 1 Table 2 Table 3 Table 4. Summarized over all genetic configurations analyzed, the uncorrected method I produced the largest and most conservative confidence intervals. A reduction of the interval width and conservativeness was possible using the uncorrected method II. Further reduction of the width was achieved using the weighted method I and, even more, by the weighted method II without lifting the noninclusion rate seriously above 1 - P.


 
View this table:
In this window
In a new window

 
Table 1. Effect of QTL position within marker interval, population size, and QTL effect on the confidence intervals for the different bootstrap methods—QTL position proximal


 
View this table:
In this window
In a new window

 
Table 2. Effect of QTL position within marker interval, population size, and QTL effect on the confidence intervals for the different bootstrap methods—QTL position central


 
View this table:
In this window
In a new window

 
Table 3. Effect of an increased marker density on the confidence intervals for the different bootstrap methods


 
View this table:
In this window
In a new window

 
Table 4. Effect of the length of the chromosome on the confidence intervals for the different boostrap methods

QTL effect:
As expected, an increase of the QTL effect resulted in a smaller confidence interval for all bootstrap schemes (Table 1 and Table 2). It is very unlikely that a QTL with an effect of a = 0.1 (this QTL would explain only 0.13% of the total variance in our simulated populations) would be detected in real mapping experiments and, hence, the uncorrected method I produced very conservative confidence intervals, which cover almost the whole chromosome. For this low QTL effect each confidence interval produced by this method was >93 cM. However, the uncorrected method II (weighted method I, weighted method II) produced 90% intervals for this small QTL effect that were significantly <90 cM (85 cM, 80 cM) for the 90% confidence intervals and were less conservative. All confidence intervals for situations of a = 0.1 were only marginally influenced by the remaining genetic parameters.

Elevating the QTL effect from a = 0.1 to 0.5 (this QTL would explain 3.2% of the total variance) reduced the average 90% interval width by 15–25 cM. The respective change by stepping 0.5 to 1.0 (11.8% of the total variance) was 40–50 cM, depending on the population size and QTL position. This reduction was remarkably constant for the four methods.

Marker spacing:
Changing marker distances from 20 to 10 cM (top of Table 2 and Table 3) and adding four additional markers around the QTL position (Table 3) resulted in a slightly reduced interval width for all bootstrap schemes. The reduction for the 90% confidence intervals varied between 2 and 4 cM. The corresponding changes were between 1 and 5 cM when marker distance was reduced to 10 cM and when additional markers were added around the QTL position, depending on the population size and the bootstrap scheme. No general dependency of the noninclusion rate on the marker density was observed for any method.

Population size:
Table 1 Table 2 Table 3 Table 4 show that the elevation of the population size from N = 300 to N = 500 led to significantly smaller confidence intervals for all four bootstrap schemes. This reduction was only marginal for a = 0.1 but for a = 0.5 it was between 12 and 14 cM and for a = 1.0 it was between 10 and 15 cM. The uncorrected method I was most sensitive for increased population size as the reduction of the interval width was greatest. This might be due to the fact that intervals calculated with this method were largest and consequently there was the most space for reduction. It was not possible to assign a change of noninclusion rates to the effect of an increased population size for the uncorrected method II and for the weighted methods I and II; it appears that the intervals computed with the uncorrected method I were in general less conservative when the population size was increased.

In general, a gain of information, provided by a greater population size, resulted in smaller intervals. This is consistent with other studies concerning QTL bootstrap confidence intervals (VISSCHER et al. 1996 Down; LEBRETON and VISSCHER 1998 Down; WALLING et al. 1998 Down). The effect of the larger population size was stronger than the effect of the higher marker density.

Length of chromosome:
The length of the chromosome had a large impact on the size of the confidence intervals and a small impact on the noninclusion rates for all bootstrap schemes; i.e., the increase of the chromosome length resulted in larger intervals and, in most cases, in slightly lower noninclusion rates (Table 4). The absolute difference between the length of the intervals produced by the four methods increased slightly with a longer chromosome without changing the ranking order.

QTL position:
The effect of the QTL position can be attributed to two factors: the QTL position within a marker interval (midway between two markers, closer to one marker, or at the same position as the marker) and the chromosomal QTL position, whether proximal or central. To avoid any confounding effects these two factors were analyzed separately.

It seems that when the QTL is located centrally (position 60 instead of 20, 55 instead of 25, and 50 instead of 30; Table 1 and Table 2) the computed intervals are smaller, regardless of the method. This is in close agreement with LEBRETON and VISSCHER 1998 Down and WALLING et al. 1998 Down and may be caused by the fact that the mapping procedure tends to estimate the QTL position closer to the middle of the chromosome, as reported by HYNE et al. 1995 Down. Additionally, the noninclusion rates were in general lower for a = 0.1 and a = 0.5 but higher for a = 1.0 for all methods.

Changing the QTL position within the marker interval from midway between two flanking markers toward a marker position (top, middle, and bottom of Table 1 and Table 2) resulted in significantly smaller confidence intervals for all four methods. For example, the reduction for the 90% interval for a = 0.5 and N = 500 was between 1 and 2 cM when d was changed from 50 to 55 cM and ~4 cM when d was changed from 55 to 60 cM, regardless of the method. The uncorrected methods I and II produced intervals that were more conservative when the QTL and the marker positions are identical compared to situations with QTL positions midway between two flanking markers. This is consistent with the findings of WALLING et al. 1998 Down. The noninclusions were in general increased for the weighted method I when d was moved toward a marker position whereas no general dependency could be observed between the QTL position and the noninclusion rates for the weighted method II.

Another remarkable topic might be the differences between the interval widths from the uncorrected methods I and II on the one hand and the differences between the weighted methods I and II on the other hand when moving the QTL closer to the start of the chromosome. To investigate the impact of the QTL position on these differences in detail, additional simulations with , and a QTL position at d = 10 or 15 cM were carried out. The average 90% interval widths computed with the uncorrected method II (weighted method II) were subtracted from the corresponding intervals computed with the uncorrected method I (weighted method I) for the genetic configurations described above and the same configurations but with d = 20 and 30 cM (Table 1) and d = 50 cM (Table 2). Additionally, the ratio of the subtracted upper tail and the subtracted lower tail of the linkage distribution and the corrected distribution during the computation of the 90% confidence interval with the analog HPD method were calculated. The results show that both the differences between the central and noncentral confidence intervals and the ratio of the subtracted upper and lower tail from the distributions increased when the QTL was located closer to the start of the chromosome (Table 5). The increase of the ratio was greater for the intervals computed from the corrected distribution than for those computed from the linkage distribution. When the QTL was midway on the chromosome (d = 50 cM) the ratio became, as expected, exactly one or near to one. It is important to note that the computed intervals for d = 10 cM were slightly anticonservative (10.4, 12.5, 12.9, and 13.9% noninclusions for the 90% intervals computed with uncorrected methods I and II and the weighted methods I and II, respectively; results not shown).


 
View this table:
In this window
In a new window

 
Table 5. Effect of the QTL position on the differences between the widths of central and noncentral confidence intervals

Selection step:
In practical QTL mapping experiments confidence intervals would be computed only when the QTL is declared significant. To investigate whether the rejection of nonsignificant replicates would change the results a selection step for two genetic configurations similar to VISSCHER et al. 1996 Down was introduced. First, according to CHURCHILL and DOERGE 1994 Down, threshold levels that gave a nominal type I error rate of 10 and 5% were calculated by taking the 90th and 95th quantiles of the test statistic of 10,000 analyzed simulated populations with no segregating QTL. Confidence intervals were computed only for replicates that showed QTL test statistics above the threshold. The results show that all computed confidence intervals were smaller and less conservative when introducing a selection step (top of Table 2 and Table 6) and when reducing the nominal type I error rate (Table 6). This finding is in agreement with VISSCHER et al. 1996 Down. Generally, the change of the nominal error rate from 10 to 5% reduced the interval width between 1 and 3 cM and increased the noninclusion rate by ~1–2%. No differences in sensitivity for the selection step between the four bootstrap methods could be observed.


 
View this table:
In this window
In a new window

 
Table 6. Effect of rejection of nonsignificant replicates on the confidence intervals for the different bootstrap methods

Number of bootstrap samples:
In computing confidence intervals from the linkage distribution VISSCHER et al. 1996 Down showed that there is only a small impact on the number of bootstrap samples. To examine whether this holds also for the computation of intervals from the corrected distribution, the number of bootstrap samples was varied for two genetic configurations (, respectively). It seems that there is a lower bound for the number of bootstrap samples to generate the corrected distribution for interval computation that is between 100 and 200 and dropping it below 100 results in smaller intervals with a higher noninclusion rate. Increasing the number of bootstrap samples above 200 did not change the results (Table 7). Following this, the number of bootstrap samples (250) used in this study is appropriate.


 
View this table:
In this window
In a new window

 
Table 7. Effect of number of bootstrap samples on the confidence intervals for the weighted methods I and II


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

The classical bootstrap method (in this study the uncorrected method I) for calculating confidence intervals for a QTL position as proposed by VISSCHER et al. 1996 Down seems a suitable and practical approach, and in some recent real QTL mapping projects this method was used to obtain confidence intervals for the QTL position (e.g., DE KONING et al. 1998 Down; ZHANG et al. 1998 Down; WALLING et al. 2000 Down). However, this method produces intervals that tend to be rather large and conservative, and therefore we proposed three methods (uncorrected method II and weighted methods I and II) to improve this classical approach to calculate smaller and less conservative confidence intervals and compared these methods within a simulation study with the classical approach.

Bias of confidence intervals:
The bias of confidence intervals for QTL position estimates (biasci) can be defined as

where NI is the noninclusion rate. The biasci for the intervals computed with the uncorrected method I was in general greatest and was upward except when the QTL was at the start of the chromosome and significantly increased for this method when the QTL position was located at a marker position. This is in agreement with the findings of VISSCHER et al. 1996 Down and WALLING et al. 1998 Down. When comparing the uncorrected method I with the weighted method I and the uncorrected method II with the weighted method II, respectively, it follows that the marker correction of the linkage distribution leads in general to a reduced bias of the confidence intervals as the noninclusion rate becomes closer to 1 - P. Note that this bias was for the weighted method I, and even more, for the weighted method II, slightly downward, but not substantial, when the QTL effect was small and the QTL was located at the start of the chromosome but upward for the remaining configurations. Together with the fact that the weighted methods I and II produced significantly smaller confidence intervals than the uncorrected methods I and II it follows that the described model of characterizing the marker impact (Equation 1Equation 2Equation 3Equation 4) and the marker correction in the weighted methods I and II is appropriate to obtain smaller and less biased bootstrap confidence intervals for the location of QTL that appear to be significant. Note that the probability of making a type I error is a function of the chosen significance level, which is not affected by any of the presented methods to compute confidence intervals.

Marker correction:
Although it may be possible to calculate the influence of the markers on the linkage distribution, in terms of the selection coefficient for each possible QTL position (each centimorgan in our study), in an analytical way, in this study it was done empirically by computing the null distribution for each genetic configuration, i.e., the distribution of the best QTL position estimates when there is no cosegregation between the QTL and any marker, and, hence, every estimate indicates a type I error per definition. This was possible because in the model it is assumed that the selection coefficient is the same for the linkage distribution as for the null distribution. The selection coefficient for each position was the corresponding frequency of the best estimate in the null distribution (Equation 6). In Fig 1A a null distribution for population size of N = 500, marker space of 20 cM, chromosome length of L = 100 cM, and QTL effect of a = 0.5 is presented. When the QTL is not linked with any marker, the probability for each position on the chromosome to achieve the best estimate is significantly higher at the marker loci. Over 45% of the estimates were placed there (under an equal distribution it would be only 5.94%). With an increased marker density (10 cM) 64% of the estimates would be placed at the marker loci. Except for the marker density and chromosome length it seems that the null distribution is of little variability, since varying the QTL effect and the population size resulted in almost the same null distribution (not shown).



View larger version (24K):
In this window
In a new window
Download PPT slide
 
Figure A1. Calculating confidence intervals from bootstrap distributions with the analog HPD method. (A) The hypothetical distribution is unimodal. The lower and upper interval endpoints are built by positions on the simulated chromosome of t'o and t'u, respectively. (B) The hypothetical distribution is not unimodal. The upper interval endpoint is the position of t'u and the position of the lower interval endpoint is calculated iteratively using Equation A1.

The extent of the influence of the markers on the linkage distribution depends on the level of the QTL effect as it is shown in (3). Fig 1B shows an average linkage distribution from all bootstrap samples and all replicates for , and for QTL position of from the start of the chromosome. When the QTL effect was , 53% (32%, 9%) of the best estimates were located at the marker loci. The same genetic configurations but with a marker density of 10 cM gave roughly 10% higher values for the hits per marker locus, regardless of the QTL effect (results not shown), and this confirmed the presence of the marker influence on the linkage distribution as first reported by WALLING et al. 1998 Down. The two higher peaks at 20 and 40 cM are a result of the high value of the product of PQTL and selection coefficients at these positions (Equation 3). Both are relatively high because the distance to the position with the simulated QTL is only 10 cM and a marker is located at these positions, respectively. It follows from this and from the empirical results of the simulation (Table 1 and Table 2) that the smaller the QTL effect (which is unknown a priori in real mapping experiments) and the lower the population size the more advisable it is to correct for the marker impact expressed in selection coefficients.

The correction for the marker was done by dividing the frequency of the best estimate in the linkage distribution by the value of the corresponding selection coefficient (see Equation 4) whereby the number of bootstrap samples should be >200 to compute a corrected distribution suitable for calculating confidence intervals (Table 7). Fig 1C shows an average corrected distribution again from all bootstrap samples and all replicates for , and . The peaks at the marker loci observed in Fig 1A and Fig B, disappeared and the highest frequency of the best estimate was in the interval of the markers flanking the QTL. Right from position 60 cM the frequency became nearly equal for all positions (including marker positions).

An average linkage distribution for a situation QTL located at a marker position is presented in Fig 1D . The high peak at the QTL position can be attributed to the relatively high values of PQTL and to the selection coefficient at this point. In contrast to Fig 1B, the accumulation of the effect of a high PQTL value and a high selection coefficient is focused only on one position (20 cM) and this might be the reason for the shorter intervals with an increased conservativeness when computed with the uncorrected methods I and II. Correcting this average distribution results in Fig 1E. Right from position 60 cM the frequency of best estimates in this distribution became again nearly equal for all positions. Most of the probability mass is put between the markers at the start of the chromosome and position 40, but at the marker coinciding with the QTL position (20 cM) the frequency is significantly reduced (~1%). Similar patterns were observed at the positions of the markers flanking the QTL in the average corrected distribution in Fig 1C (QTL midway between flanking markers). Note that for this genetic configuration the null distribution follows the distribution presented in Fig 1A rather well and is therefore not shown.

Computing the null distribution:
In the simulation the null distributions were derived in a simplified manner by setting the recombination rate between the QTL and any marker to 0.5. This had the advantage that the required CPU time was minimal, but is not possible in real QTL mapping experiments. Alternatively, the permutation approach as proposed by CHURCHILL and DOERGE 1994 Down can be used to compute the null distribution. The authors introduced this method to estimate empirical threshold values in QTL mapping, which are tailored to the experiment. By repeatedly randomly shuffling the trait values while keeping the marker data constant, they uncoupled the phenotype-genotype association, and after applying the QTL-mapping procedure, every estimate would indicate a type I error per definition. There is no reason why this distribution of the QTL estimates should not be influenced by the markers expressed in the selection coefficient for each position on the chromosome as described in our model, and so this distribution should be a very similar null distribution as obtained in this simulation. To show that the permutation method is suitable to calculate the null distribution we performed a further simulation with 1000 permutations of the phenotypic data for every replicate. The null distribution was the distribution from the QTL position estimates out of the evaluation of the permuted data and was calculated for each replicate individually. To avoid a frequency of zero in the null distribution it was assumed that each position on the simulated chromosome would receive at least one QTL position estimate from the evaluation of the permuted data. The population size was N = 500, marker spacing was 20 cM, the QTL was put at , and the QTL effects were a = 0.1, 0.5, and 1.0, respectively. To save CPU time the total number of replicates for each genetic configuration was limited to 200. The confidence intervals were calculated with the weighted methods I and II as described above and the obtained results were compared with the corresponding genetic configuration in the initial simulation, where the null distribution was derived in a simplified manner. The results are presented in Table 8 and show that there were no significant differences between the two simulated series although the null distribution was built out of 1000 estimates when computed with the permutation method, compared to 250,000 estimates in the initial simulation, and although different numbers of replicates were chosen. This confirmed the finding of the low variability of the null distribution as mentioned above.


 
View this table:
In this window
In a new window

 
Table 8. Effect of the method for calculating the null distribution on the confidence intervals for the different corrected bootstrap methods

In real QTL mapping experiments the permutation method has become a standard to calculate the error probabilities. Therefore, the null distribution can be derived as a by-product without extra computational work. It is only necessary to retain the position of the highest test statistic for each permutation and chromosome and to ensure that this distribution covers the whole chromosome. This is necessary to avoid a division by zero (Equation 4). This can be done by choosing either a sufficiently high number of permutations or, as in the present study, by adding one estimate to those positions not present in the initial distribution of the estimates from the evaluated permuted data. In the simulation, 1000 permutations were shown to be enough to compute marker-corrected confidence intervals for a 100-cM chromosome with six informative markers (Table 8) when adding additional hits. The number of permutations carried out to compute threshold values is in practice usually 10-fold higher.

Analog HPD method:
The use of the analog HPD method to compute noncentral confidence intervals led to a reduced width and a reduced conservativeness of the intervals not only when the QTL was located closer to one end but also when located in the middle of the chromosome (Table 5). Intuitively one would not expect significant differences between central and noncentral intervals when the QTL is in the middle of the chromosome and the markers are located symmetrically on the chromosome. The likely explanation is that the bootstrap distributions, whether corrected for the markers or not, are tailed equally only in exceptional cases, regardless of the position of the QTL (not shown). A prerequisite for the analog HPD method to compute shorter confidence intervals is that the bootstrap distribution is not tailed equally (see Appendix and BOX and TIAO 1992 Down).

Conclusions:
In summary, it was shown that in the computation of bootstrap confidence intervals for estimated QTL locations it is useful, first, to correct the bootstrap linkage distribution for the marker impact and, second, to calculate noncentral intervals. Therefore, the use of the weighted method II is recommended, which combines the properties of the marker correction by the weighted method and the properties of the computation of noncentral intervals by the analog HPD method. The numerical results of the simulation showed that this method produced the shortest confidence intervals while maintaining approximately the noninclusion rate at 1 - P for a wide range of simulated genetic configurations. Provided permutation testing is used in an experiment, this method does not require extra computational work in comparison to the original bootstrap method. The combination of the permutation approach and the bootstrap approach can be termed permutation bootstrapping.


*  ACKNOWLEDGMENTS

This study has benefited enormously from the critical comments of two anonymous referees. It was supported by the German Cattle Breeders Federation (ADR) and the German Ministry of Education, Science, Research and Technology (BMBF).

Manuscript received July 30, 2001; Accepted for publication January 9, 2002.


*  APPENDIX
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

THE ANALOG HPD METHOD
BOX and TIAO 1992 Down define a HPD region as a region in which the probability density of every point inside is at least as large as that of any point outside. The analog HPD method was adapted from the standard HPD to compute noncentral confidence intervals from the distribution of the best QTL estimate (from the linkage distribution in the uncorrected method II and from the corrected distribution in the weighted method II, respectively). When the distribution was not tailed equally these intervals were noncentral because unequal parts were subtracted from each tail. In contrast to the standard HPD method it does not make any assumptions regarding the form of the distribution, whether it is unimodal or even multimodal. If the distribution has the latter form the initial definition of a HPD region cited above does not hold.

The 95 and 90% confidence intervals were calculated iteratively as follows:

  1. Draw a line parallel to the x-axis (the simulated chromosome) with an initial low value of {epsilon} (the distance between the line and the x-axis), label the position of the x-axis from the lower outstanding intercept point between the distribution and the line as to and the corresponding position from the upper outstanding intercept point as tu.

  2. Calculate the sum of the frequencies of the distribution left from to and right from tu,

    where to and tu are the positions of the x-axis from the lower and upper intercept points between the distribution and the parallel line, respectively, and L is the length (in centimorgans) of the chromosome.

  3. Decide three possible cases:

  4. Case 1. The sum is smaller than the given error rate of 1 - P. In this case repeat steps 1 and 2 with a marginally increased value of {epsilon}, and label the new positions of to and tu subsequently as t'o and t'u, respectively.

  5. Case 2. The sum is equal to the given error rate of 1 - P. The lower interval endpoint is the position of to and the upper interval endpoint is the position of tu. The frequencies of both interval endpoints are equal. See Fig 1A.

  6. Case 3. The sum is greater than the given error rate of 1 - P and too much was subtracted from the distribution. If it is the first round of iteration start step 1 again with a lower initial value of {epsilon}. If it is not the first round of iteration this is not caused by a too low value for {epsilon} (a marginally lower value for {epsilon} would have resulted in case 2 and not in case 1 in the previous round of iteration); it is caused by the form of the distribution, which is in this case not unimodal. To determine the interval endpoints, two differences ({Delta}1 and {Delta}2) must be calculated:

    If {Delta}1 < {Delta}2 then the lower interval endpoint is the position of t'o and the upper endpoint (ue) is that point where

    This formula has to be solved iteratively. However, if {Delta}2 < {Delta}1, then the upper interval endpoint is the position of t'u and the lower endpoint (indicated by le) is that point where

    (A1)

    Again, this formula must be solved iteratively. See also Fig 1B.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

BOX, G. E. P., and G. C. TIAO, 1992 Bayesian Inference in Statistical Analysis. Wiley Classics Library, New York.

CHURCHILL, G. A. and R. W. DOERGE, 1994  Empirical threshold values for quantitative trait mapping. Genetics 138:963-971[Abstract].

CONNEALLY, P. M., J. H. EDWARDS, K. K. KIDD, J. M. LALOUEL, and N. E. MORTON et al., 1985  Reports of the committee methods of linkage analysis and reporting. Cytogenet. Cell Genet. 40:356-359[Medline].

DARVASI, A., A. WEINREB, V. MINKE, J. I. WELLER, and M. SOLLER, 1993  Detecting marker-QTL linkage and estimating QTL gene effect and map location using a saturated genetic map. Genetics 134:943-951[Abstract].

DE KONING, D. J., P. M. VISSCHER, S. A. KNOTT, and C. S. HALEY, 1998  A strategy for QTL detection in half-sib populations. Anim. Sci. 67:257-268.

EFRON, B., 1979  Bootstrap methods: another look at the jackknife. Ann. Stat. 7:1-26.

EFRON, B., 1982 The Jackknife, the Bootstrap and Other Resampling Plans. Society for Industrial and Applied Mathematics, Philadelphia.

EFRON, B., and R. B. TIBSHIRANI, 1993 An Introduction to the Bootstrap. Chapman & Hall, New York.

HALEY, C. S. and S. A. KNOTT, 1992  A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69:315-324[Medline].

HYNE, V., M. J. KEARSEY, D. J. PIKE, and J. W. SNAPE, 1995  QTL analysis: unreliability and bias in estimation procedures. Mol. Breed. 1:273-282.

KNOTT, S. A., J. M. ELSEN and C. S. HALEY, 1994 Multiple marker mapping of quantitative trait loci in half-sib populations. Proceedings of the 5th World Congress on Genetics Applied to Livestock Production. Vol. 21. Guelph, Canada, pp. 33–36.

LANDER, E. S. and D. BOTSTEIN, 1989  Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121:185-199[Abstract/Free Full Text].

LEBRETON, C. M. and P. M. VISSCHER, 1998  Empirical nonparametric bootstrap strategies in quantitative trait loci mapping: conditioning on the genetic model. Genetics 148:525-536[Abstract/Free Full Text].

MANGIN, B. and B. GOFFINET, 1997  Comparison of several confidence intervals for QTL location. Heredity 78:345-353.

MANGIN, B., B. GOFFINET, and A. REBAI, 1994  Constructing confidence intervals for QTL location. Genetics 138:1301-1308[Abstract].

MARTINEZ, O. and R. N. CURNOW, 1992  Estimating the locations and the sizes of the effects of quantitative trait loci using flanking markers. Theor. Appl. Genet. 85:480-488.

NUMERICAL ALGORITHMS GROUP, 1990 The NAG Fortran Library Manual. Numerical Algorithms Group Ltd., Oxford.

REINSCH, N., 1999  A multiple-species, multiple-project database for genotypes at codominant loci. J. Anim. Breed. Genet. 116:425-435.

SAS INSTITUTE, 1992 SAS Language, Version 6, Ed. 1. SAS Institute, Cary, NC.

VAN OOIJEN, J. W., 1992  Accuracy of mapping quantitative trait loci in autogamous species. Theor. Appl. Genet. 84:803-811.

VISSCHER, P. M., R. THOMPSON, and C. S. HALEY, 1996  Confidence intervals in QTL mapping by bootstrapping. Genetics 143:1013-1020[Abstract].

WALLING, G. A., P. M. VISSCHER, and C. S. HALEY, 1998  A comparison of bootstrap methods to construct confidence intervals in QTL mapping. Genet. Res. 71:171-180.

WALLING, G. A., P. M. VISSCHER, L. ANDERSSON, M. F. ROTHSCHILD, and L. WANG et al., 2000  Combined analyses of data from quantitative trait loci mapping studies: chromosome 4 effects on porcine growth and fatness. Genetics 155:1369-1378[Abstract/Free Full Text].

ZHANG, Q., D. BOICHARD, I. HOESCHELE, C. ERNST, and A. EGGEN et al., 1998  Mapping quantitative trait loci for milk production and health of dairy cattle in a large outbred pedigree. Genetics 149:1959-1973[Abstract/Free Full Text].




This article has been cited by other articles:


Home page
GeneticsHome page
R. D. Ball
Quantifying Evidence for Candidate Gene Polymorphisms: Bayesian Analysis Combining Sequence-Specific and Quantitative Trait Loci Colocation Information
Genetics, December 1, 2007; 177(4): 2399 - 2416.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. Bennewitz, N. Reinsch, V. Guiard, S. Fritz, H. Thomsen, C. Looft, C. Kuhn, M. Schwerin, C. Weimann, G. Erhardt, et al.
Multiple Quantitative Trait Loci Mapping With Cofactors and Application of Alternative Variants of the False Discovery Rate in an Enlarged Granddaughter Design
Genetics, October 1, 2004; 168(2): 1019 - 1027.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
S. N. Forbes, R. K. Valenzuela, P. Keim, and P. M. Service
Quantitative Trait Loci Affecting Life Span in Replicated Populations of Drosophila melanogaster. I. Composite Interval Mapping
Genetics, September 1, 2004; 168(1): 301 - 311.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
C. C. Schon, H. F. Utz, S. Groh, B. Truberg, S. Openshaw, and A. E. Melchinger
Quantitative Trait Locus Mapping Based on Resampling in a Vast Maize Testcross Experiment and Its Relevance to Quantitative Genetics for Complex Traits
Genetics, May 1, 2004; 167(1): 485 - 498.
[Abstract] [Full Text] [PDF]


Home page
Sci Aging Knowl EnvironHome page
P. M. Service
How Good Are Quantitative Complementation Tests?
Sci. Aging Knowl. Environ., March 24, 2004; 2004(12): pe13 - pe13.
[Abstract] [Full Text]


Home page
J DAIRY SCIHome page
J. Bennewitz, N. Reinsch, S. Paul, C. Looft, B. Kaupe, C. Weimann, G. Erhardt, G. Thaller, Ch. Kuhn, M. Schwerin, et al.
The DGAT1 K232A Mutation Is Not Solely Responsible for the Milk Production Quantitative Trait Locus on the Bovine Chromosome 14
J Dairy Sci, February 1, 2004; 87(2): 431 - 442.
[Abstract] [Full Text] [PDF]


Home page
J DAIRY SCIHome page
M. Ron, E. Feldmesser, M. Golik, I. Tager-Cohen, D. Kliger, V. Reiss, R. Domochovsky, O. Alus, E. Seroussi, E. Ezra, et al.
A Complete Genome Scan of the Israeli Holstein Population for Quantitative Trait Loci by a Daughter Design
J Dairy Sci, February 1, 2004; 87(2): 476 - 490.
[Abstract] [Full Text] [PDF]


Home page
Crop Sci.Home page
R. Mihaljevic, H. F. Utz, and A. E. Melchinger
Congruency of Quantitative Trait Loci Detected for Agronomic Traits in Testcrosses of Five Populations of European Maize
Crop Sci., January 1, 2004; 44(1): 114 - 124.
[Abstract] [Full Text] [PDF]