| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Corresponding author: Albrecht E. Melchinger, Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, 70593 Stuttgart, Germany., melchinger{at}uni-hohenheim.de (E-mail)
Communicating editor: Z-B. ZENG
| ABSTRACT |
|---|
Cross validation (CV) was used to analyze the effects of different environments and different genotypic samples on estimates of the proportion of genotypic variance explained by QTL (p). Testcrosses of 344 F3 maize lines grown in four environments were evaluated for a number of agronomic traits. In each of 200 replicated CV runs, this data set was subdivided into an estimation set (ES) and various test sets (TS). ES were used to map QTL and estimate p for each run (
ES) and its median (
ES) across all runs. The bias of these estimates was assessed by comparison with the median (
TS.ES) obtained from TS. We also used two independent validation samples derived from the same cross for further comparison. The median
ES showed a large upward bias compared to
TS.ES. Environmental sampling generally had a smaller effect on the bias of
ES than genotypic sampling or both factors simultaneously. In independent validation,
TS.ES was on average only 50% of
ES. A wide range among
ES reflected a large sampling error of these estimates. QTL frequency distributions and comparison of estimated QTL effects indicated a low precision of QTL localization and an upward bias in the absolute values of estimated QTL effects from ES. CV with data from three QTL studies reported in the literature yielded similar results as those obtained with maize testcrosses. We therefore recommend CV for obtaining asymptotically unbiased estimates of p and consequently a realistic assessment of the prospects of MAS.
MOLECULAR markers are used by a great number of researchers to study quantitative traits of agronomic importance. The primary objective of these studies has been the identification of markers associated with quantitative trait loci (QTL) and their use in subsequent marker-assisted selection (MAS) programs.
In the statistical analysis of quantitatively inherited traits, the introduction of QTL interval mapping and maximum-likelihood estimation of effects by ![]()
![]()
![]()
![]()
![]()
![]()
Identification of significant QTL-marker associations forms the baseline for MAS. To be superior to classical phenotypic selection, several prerequisites must be satisfied: (1) QTL positions are estimated with high precision to choose markers showing a minimum of recombination with the QTL and to resolve linked QTL; (2) estimated QTL effects reflect their true genetic effects and, therefore, are estimated without bias due to genotypic or environmental sampling; (3) a sufficient proportion of the genotypic variance of the trait under study is explained by the detected QTL.
With respect to these prerequisites, the available statistical methods still have considerable shortcomings. Using computer simulations it was shown that (a) estimates of individual QTL effects and the proportion of genotypic variance explained by QTL can be severely inflated, leading to an overly optimistic assessment of the prospects of MAS (![]()
![]()
![]()
![]()
![]()
In most experimental studies these limitations have been ignored even though with experimental data, the bias in QTL effects is expected to be even greater than in computer simulations that rely on simplifying assumptions. To overcome these pitfalls, ![]()
In an earlier study, we demonstrated for experimental data in maize that the magnitude of QTL effects and the proportion of the phenotypic and genotypic variance explained by QTL decreased substantially when estimated in independent validation samples (![]()
Several authors (e.g., ![]()
![]()
![]()
| MATERIALS AND METHODS |
|---|
Plant materials:
The plant materials used for this study were partly identical to those described by ![]()
![]()
Field experiments:
The TC progenies were evaluated in three different experiments. Experiment 1 (Exp. 1) comprised 380 TC of F3 lines, TC of P1 and P2 included as quintuple entries, and 10 common check hybrids. Trials were conducted in 1990 and 1991 at two sites in Germany. Data on plant height were additionally available from forage trials conducted at five environments in Germany described in detail by ![]()
Experiment 2 (Exp. 2) comprised TC of an independent set of 127 F3 lines, TC of P1 and P2 included as six and seven entries, respectively, and the same set of 10 check hybrids as in Exp.1. Trials were grown in 1992 and 1993 at two sites in Germany.
Experiment 3 (Exp. 3) comprised TC of 71 F5 lines derived from 71 F3 lines, whose TC were grown in Exp. 2, TC of P1 and P2 included as multiple entries, and the same check hybrids as in Exp. 1 and Exp. 2. Exp. 3 was grown in 1992 at four sites in Germany, two of which were in common with Exp. 2, and one additional site in France.
In all three experiments the experimental design was a generalized lattice with two replications (![]()
Data were analyzed for the following traits: grain yield (GY) in Mg ha-1, adjusted to 155 g kg-1 grain moisture, grain moisture (GM) in g kg-1 at harvest, kernel weight (KW) in mg kernel-1 determined from four samples of 50 kernels from each plot, and plant height (PH) measured in centimeters on a plot basis as the distance from the soil level to the lowest tassel branch.
RFLP marker genotyping and linkage map construction:
The procedures for RFLP assays were described by ![]()
![]()
![]()
Agronomic data analyses:
For each experiment adjusted entry means and effective error mean squares derived from analyses of variance of each site-year combination were used to compute the combined analyses of variance across environments. For estimation of quantitative genetic parameters such as variance components and heritabilities, see ![]()
p) and genotypic (
g) correlations between means of related TC progenies from Exp. 2 and Exp. 3 were calculated following standard procedures (![]()
QTL analyses:
QTL mapping and estimation of their effects were performed with PLABQTL (![]()
![]()
![]()
![]()
![]()
![]() |
(1) |
Here, Yj denotes the mean phenotypic trait value of the TC progeny of line j averaged across environments; µP1 is the mean phenotypic trait value of TC progeny carrying the allele from P1 at the
th QTL; 
is the average effect of substituting allele q in P1 by allele Q in P2 at the putative QTL in the marker interval
with flanking markers
' and
'' (subsequently denoted additive effect); x*j
is the conditional expectation of the dummy variable
j
given the observed genotypes at flanking marker loci
' and
'', where
j
assumes values 0, 0.5, or 1, if the genotype of the F2 individual j at the putative QTL is qq, Qq, or QQ, respectively; bm is the partial regression coefficient of phenotype Yj on the mth (selected) marker; xjm is a dummy variable (cofactor) taking values 0, 0.5, or 1 depending on whether the marker genotype of the parental F2 individual j at marker locus m is homozygous P1, heterozygous, or homozygous P2, respectively; and
j is a residual variable for the TC progeny of the jth F3 line.
The selection of cofactors was described by ![]()
![]()
![]() |
(2) |
where R2 is the coefficient of determination of regression fitting a model including z predictors (number of QTL positions and effects) and N is the number of phenotypic observations used in multiple regression. When R2 is zero or small, R2adj can become negative. In our calculations negative values of R2adj were allowed, because when imposing a lower bound R2adj would no longer be unbiased (![]()
The proportion of the genotypic variance explained by all detected QTL was estimated from the ratio
![]() |
(3) |
where
2 is the heritability of the respective trait calculated on an entry-mean basis (![]()
![]() |
(4) |
with
2 denoting the effective error variance,
2ge the G x E interaction variance,
2g the genotypic variance, r the number of replications, and e the number of test environments. All variance components were estimated from Exp. 1 unless stated otherwise.
Cross validation:
One approach applied for evaluation of QTL mapping results was cross validation. Here, the entire data set (DS) is split into subsets. One or several subsets combined form the estimation set (ES) for QTL detection, localization, and estimation of genetic effects. The remaining subset(s) form the test set (TS) in which predictions derived from ES are tested for their validity by correlating predicted and observed data. For example, in fivefold CV, the DS comprising marker data from 344 F2 plants and phenotypic data of their TC progenies from Exp. 1 was randomly subdivided into k = 5 genotypic subsamples, 4 with 69, 1 with 68 genotypes (Fig 1). Each of the 5 genotypic subsamples was evaluated in four environments, and consequently DS was divided into 20 disconnected subsets. For testing the effect of (i) environmental sampling (CV/E), (ii) genotypic sampling (CV/G), and (iii) both factors simultaneously (CV/GE), the same ES but different TS were used. The ES consisted of four genotypic subsamples with phenotypic data from three of the four environments. In CV/E, the TS consisted of the same four genotypic subsamples as in ES, but phenotypic data came from the fourth, disconnected environment. In CV/G, the fifth disconnected genotypic subsample with phenotypic data from the same three environments as in ES was used as TS. In CV/GE, estimates of QTL effects were obtained by using the one subset not connected with ES, either by environment or by genotypes, as TS. Consequently, by permutating the respective k subsets used for ES and TS, 4 x k = 20 different CV runs are possible for fivefold cross validation. To increase the precision of estimates of p, additional CV runs were generated by using 10 different randomizations for assigning genotypes to the respective subsamples, yielding a total of 10 x 4 x k = 200 replicated CV runs.
|
The effect of sample size in ES and TS was tested by varying the number of genotypic and the number of environmental subsamples used for estimation and testing. Genotypic subsampling was tested in five different CV schemes, dividing DS into k (k = 2, 3, 5, 9, and 16) genotypic subsamples containing N/k TC progenies each (Table 1). An additional cross validation scheme (CVk=1) was created by randomly subdividing DS into subsamples of size NES = 100 and NTS = 244. For all CV schemes, estimates of p were obtained as the median from a minimum of 200 CV runs originating from an appropriate number of different randomizations (Table 1).
|
For plant height, which was evaluated in nine environments, additional CV schemes were analyzed by varying the number of environments included in ES (u = 1, 2, 3, 4, and 8) for k = 5 (Table 1). The corresponding TS were based on TC progeny means averaged across the remaining e = 9 - u disconnected environments. The number of possible CV runs obtained by a single randomization was 5(9u) for the respective CV schemes (Table 1).
The magnitude of bias in estimates of genotypic variance explained by QTL due to genotypic and/or environmental sampling was obtained by comparing estimates of p obtained from the ES and TS. Based on QTL mapping results obtained with composite interval mapping (CIM) from the ES, the genotypic value of F3 line j in TS QTS.ESj can be predicted according to
![]() |
(5) |
with X*TSj being the vector of conditional expectations of the dummy variable
j
given the observed genotypes at the flanking marker loci
' and
'' and a significant QTL (LOD > 2.5) in the
th marker interval in ES. For
j
values, 0, 0.5, or 1 are assumed, if the genotype of the corresponding F2 plant j at the QTL significant in ES is qq, Qq, or QQ, respectively;
*ES is the vector of genetic effects of all significant QTL detected in ES, estimated as partial regression coefficients from a simultaneous fit in ES.
The proportion of the genotypic variance explained by QTL in TS (
TS.ES) is calculated from the adjusted squared correlation coefficient (R2adj, see Equation 2) between the phenotypic means observed in TS (YTS) and the predicted genotypic values QTS.ES on the basis of results derived from ES, divided by the heritability of the trait (see Equation 4) under study for the respective value of e:
![]() |
(6) |
The estimate
TS.ES is asymptotically unbiased in the case of CV/GE because the data in TSCV/GE are independent from the data in ES from which the a priori model of prediction is determined (![]()
TS.ES calculated for CV/E and CV/G are still biased by genotypic and environmental sampling, respectively, because TS are not independent from ES.
Using a LOD threshold of 2.5 each CV run yielded different estimates for the number of QTL, their location, and genetic effects in ES. Estimates of p in ES and TS were calculated as the median
over all replicated CV runs. For CV runs with no QTL detected in ES,
was assumed to be zero. The average number of QTL was determined as the mean across replicated CV runs.
A more detailed analysis was performed for putative QTL for GY and PH on chromosome 7. Precision of QTL localization was assessed by determining the relative frequency of detected QTL for 1376 replicated runs in 1-cM intervals along chromosome 7 from ES with k = 5, u = 3, and e = 1. In ES, allele substitution effects (
ES) were estimated from a simultaneous fit of all significant QTL in each of the 1376 CV runs. The median allele substitution effect
ES was calculated for each position along the chromosome. For each
ES, the corresponding allele substitution effect from TSCV/GE (
TS) was determined by multiple regression based on (a) the map positions of all QTL detected in ES and (b) the marker genotypes at the flanking markers of the F2 plants in TS according to described procedures (![]()
![]()
TS was calculated across all CV runs.
Validation with independent samples:
Statistical theory and procedures for the alternative approach of testing results obtained in ES with independent validation samples are equivalent to cross validation. The same estimation sets as in CV were used to predict genotypic values QVS.ES for validation sets (VS). The adjusted squared correlation coefficient (R2adj) between QVS.ES and the entry means (YVS) from VS1 (N = 107) and VS2 (N = 71), divided by the heritabilities
2VS1 and
2VS2, served as an unbiased estimate of the genotypic variance explained by putative QTL in validation test sets. Variance components for the calculation of
2VS1 and
2VS2 were estimated from Exp. 2 and Exp. 3, respectively. Estimates of p were calculated as the median of replicated validation runs, the number of replications corresponding to the number of ES for the respective factor k (Table 1). For validation with VS2, calculations of conditional expectations of the genotype of F5 lines at the putative QTL given flanking marker genotypes were adjusted to parental F4 instead of F2 plants.
| RESULTS |
|---|
Trait means, variances, and heritabilities:
Quantitative genetic parameters for Exp. 1 and Exp. 2 were presented in detail by ![]()
2ge were significantly greater than zero (P < 0.01) for all traits in Exp. 1 and Exp. 3. In Exp. 2 significant G x E interactions were found only for GM and KW but not for GY and PH. Heritabilities exceeded 0.70 for all traits except GY in Exp. 1 (
2 = 0.48) and were highest in Exp. 3 (0.700.92) owing to larger genotypic variances and an additional test environment. Phenotypic (
p) and genotypic (
g) correlations between related TC progenies from early (F3 lines) and advanced (F5 lines) selfing generations ranged between 0.32 and 0.44 except for
g = 0.62 for GY.
QTL analyses:
For detailed results from QTL analyses based on the entire DS (Exp. 1) see ![]()
|
For all traits but GY, the average number of QTL detected increased with increasing sample size N in ES and was almost twice as large for k = 16 (N = 323) as compared to k = 1 (N = 100; Fig 2). The median
ES increased only slightly (
20%) with increasing N and even decreased for GY. In all three cross validation schemes (CV/E, CV/G, and CV/GE),
TS.ES was substantially reduced as compared to
ES for all values of k. The largest reduction in
TS.ES was found for GY in CV/GE for k = 9 and k = 16. In most cases, CV/G resulted in lower values for
TS.ES than CV/E. Except for GY, the difference between the two CV schemes was most pronounced for small k. For GM,
TS.ES was slightly larger for CV/G than for CV/E in half of the CV schemes (k = 5, 9, and 16). In cross validation CV/GE generally resulted in the smallest values for
TS.ES. For PH however, values for
TS.ES from CV/GE were slightly greater than those for CV/G in some CV schemes.
Validation with the two independent samples VS1 and VS2 resulted in the lowest
TS.ES values, except for KW (Fig 2). On average only 50% of
ES could be confirmed. For all traits except PH, estimates of
TS.ES from VS1 and VS2 were comparable to those from CV/GE. For PH,
TS.ES for VS1 and VS2 was surprisingly small as compared to CV schemes, probably due to genotypic and environmental sampling in the validation experiments. Contrary to expectation, V/VS1 yielded smaller values for
TS.ES than V/VS2 for GY and KW.
For GM, KW, and PH, 95% confidence intervals for
ES span 13% for all k except for k = 1 with ~6% (data not shown). For GY, 95% confidence intervals for
ES ranged from 5 to 8%. In CV, 95% confidence intervals for
TS.ES were of similar size as for corresponding estimates of
ES, but were quite large for CV/GE and k = 16 with a maximum of 13% for CV/GE with k = 16 of PH. Fig 3 shows the range in the number of QTL detected in ES and the variation in
ES and
TS.ES among the different CV runs for k = 5. For GY, the number of QTL detected in the 200 estimation runs varied from 0 to 8. In 11 of the 200 ES no QTL for GY was detected. For GM, KW, and PH, between 4 and 16 QTL were significant in ES. A wide variation of
ES and
TS.ES was found for all three CV schemes and all traits, and the range for
TS.ES was substantially larger than for
ES, the latter being generally of similar magnitude as for
TS.ES with independent validation (VS1, VS2).
|
When varying the number of environments in ES (u) and TS (e) with k = 5 for PH, on average 714 QTL were detected in ES and
ES varied between 56.4 and 67.1% (Fig 4). Both the number of QTL detected and
ES increased with an increase in u. The median
TS.ES was considerably reduced for CV/G and CV/GE as compared to
ES, the difference being smaller for greater values of u. In comparison with
ES, the median
TS.ES for CV/E was smaller only for u = 1 and 2 but greater for u = 3, 4, and 8. In validation with independent samples (VS1 and VS2),
TS.ES was substantially smaller than in CV/GE, the largest reduction being found for VS2 with u = 1. For u = 8 and k = 5 the variation among results from replicated runs is presented in Fig 4. In ES, the analysis of PH yielded 1017 QTL. The range in
TS.ES from CV/E and CV/GE was considerably larger than for
ES. The variation in
TS.ES values for independent validation (VS1 and VS2) was comparable to
ES.
|
Results of a more detailed analysis (1376 replicated runs) of one QTL for GY and two QTL for PH on chromosome 7 are presented in Fig 5. In the 1376 ES for k = 5, significant QTL for GY were detected at almost every position along the chromosome (Fig 5, bottom left). At position 75 cM the maximum of the distribution of relative QTL frequencies was reached (7.5%) but the distribution did not show a well-defined peak. The median allele substitution effects estimated from ES (
ES) and TS (
TS) are presented if the QTL frequency exceeded 2% (Fig 5, top). Otherwise sampling errors of estimates of effects were considered too large. At position 75 cM
ES was 0.46 Mg ha-1 as compared to 0.30 Mg ha-1 for
TS. For PH the distribution of QTL frequencies along chromosome 7 was bimodal, showing distinct peaks at position 0 cM (13.9%) and 61 cM (11.4%; Fig 5, bottom right). Genetic effects at the two QTL were of opposite sign (Fig 5, top right). In the region 010 cM the absolute value of
ES was larger than
TS from CV/GE, amounting to
ES = -2.7 cm and
TS = -2.1 cm at position 1 cM. Accordingly, at 61 cM the median
ES (4.0 cm) was greater than
TS (3.4 cm).
|
| DISCUSSION |
|---|
Resampling methods:
All statistical methods used for QTL analysis share the problem of model selection because the true number and position of QTL and, hence, the correct statistical model estimating their genetic effects, are unknown. With CIM, the general procedure is to identify among a large number of regressor variables xi (coded marker genotypes or functions of them) those that account for the largest proportion in the variance of the response variable Y (phenotypic values), and use them for estimation of QTL effects and p. With a limited sample size, model selection leads to an overestimation of QTL effects and p due to sampling effects and consequently to a biased assessment of the prospects of MAS. In this experimental study, we tried to quantify the prediction error of our QTL models and to obtain unbiased estimates of the proportion of genetic variance explained by the detected QTL using resampling methods. CV was preferred over bootstrapping for two reasons: (1) CV/GE provides asymptotically unbiased estimates of p because the data in TS used for testing the prediction are stochastically independent from the data in ES from which the prediction rule is inferred (![]()
Cross validation:
In the five CV and validation schemes,
TS.ES was considerably reduced as compared to
ES, indicating a large upward bias in predictors of p inferred from estimation sets. The relative bias of estimation (1 -
TS.ES/
ES) was greatest for GY. The complex genetic architecture of the trait resulted in only few (two to three) QTL detected in ES with highly overestimated genetic effects owing to sampling and the relatively low heritability of the trait. Therefore, only a small proportion of
ES could be validated in the various TS (Fig 2). The median
TS.ES was 0.0 in two CV schemes (k = 9 and k = 16) indicating that in half of the CV runs no selection gain would have been achieved by choosing the respective markers for selection.
![]()
![]()
TS.ES/
ES) was almost identical for both threshold levels for GM, KW, and PH, demonstrating a certain robustness of CV results. Unless only few QTL were detected for a certain quantitative trait, we did not observe that the magnitude of the bias due to model selection in estimation of QTL effects was strongly influenced by the LOD threshold applied.
Choice of k in CV:
When using CV, the value k for subdivision of the original DS is crucial for determining the bias of
ES. ![]()
TS.ES/
ES) in CV. For GY, however, the relative bias did not decrease with increasing values of k. One reason could be that for a trait with low heritability and complex genetic architecture like GY, even a sample size of N = 323 in ES does not provide sufficient power for detection of "true" QTL. To allow in CV for both estimation with a minimum of bias and testing with a minimum of sampling error, the factor k for subdivision of DS must be chosen prudently depending on the size of the original DS. This is particularly important with a large number of predictor variables in the model, e.g., for complex traits with a large number of detected QTL or when estimating and testing the effects of epistasis.
Choice of u in CV:
For the highly heritable trait PH, u = 3 seemed sufficient to obtain an almost perfect agreement between
ES and
TS.ES for CV/E. However, for CV/G and CV/GE the closest agreement was obtained with u = 8. Therefore, we recommend for CV to include a maximum of environments in ES, leaving only one disconnected environment in TS.
Independent validation:
As expected,
TS.ES in independent validation samples VS1 and VS2 increased when the power for QTL detection was improved and the estimation bias was reduced due to increased sample sizes used in ES. Best prediction generally was obtained from estimation in DS, except for PH. The median
TS.ES was smaller in validation samples than in CV/GE for all CV schemes and all traits except KW. Several confounded factors have probably contributed to this finding. For determining
TS.ES for VS1 and VS2, the same genotypic sample was used as TS in all replicated runs, while in CV/GE genotypic sampling was varied for TS. This is shown by the smaller range of
TS.ES for VS1 and VS2 than for CV/GE (k = 5; Fig 3). Hence, for an assessment of the average gain from MAS, results from CV/GE are to be preferred over independent validation because the latter can be influenced considerably by the specific genotypic sample used for TS. A further reason for the differences in
TS.ES between VS and CV/GE could be the fact that results from environments of Exp. 1 were only partially valid for the environments of Exp. 2 (VS1) and Exp. 3 (VS2). It was surprising, however, that
TS.ES in VS2 was higher than in VS1 for GY and KW. The opposite was expected, because linkage disequilibrium between markers and QTL is reduced in advanced selfing generations. The slightly different genotypic sample and the higher heritability of Exp. 3 in comparison to Exp. 2 might have contributed to this discrepancy.
CV with data from the literature:
To examine whether our conclusions concerning the magnitude of bias in
ES revealed by CV could be extended beyond the scope of this study, data from three published QTL experiments on agronomic traits in barley (![]()
![]()
![]()
ES to
TS.ES than CV/E, and CV/GE showed the largest reduction in
TS.ES except for plant height in barley. For GY of barley, the decrease from
ES to
TS.ES was considerable for CV/G and CV/GE despite the large number of environments used in estimation (u = 15). Environmental sampling had a fairly large effect on estimates of p for tunnel length in the study of ![]()
ES is likely to be increased due to the greater number of parameters to be estimated and their larger sampling error in comparison to additive effects.
|
Number of QTL and size of effects:
The current knowledge about the efficiency of MAS has mainly been inferred from computer simulations (for review see ![]()
10) genes of large effects that lead to a Gaussian normal distribution. These assumptions were supported by the results of numerous experimental studies, where QTL with large genotypic effects on quantitative traits were detected with small population sizes and few test environments (for review see ![]()
![]()
TS.ES in CV/GE being <60% for all traits. According to ![]()
Presumably due to the negligible estimation bias with the large population size used and in accordance with the large number of QTL detected, ![]()
Precision of QTL localization:
An additional assumption of simulation studies on MAS is that linkage between markers used for selection and the QTL is tight (![]()
![]()
![]()
![]()
Recommendations:
From our experience with these experimental data, we recommend using all three CV schemes (CV/E, CV/G, and CV/GE) to evaluate the influence of environmental and genotypic sampling on the magnitude of the bias of estimates of p. With CIM based on multiple regression, CV should be computationally feasible on standard personal computers. In addition, for CV only little extra experimental expenditures are required in contrast to independent validation. If only limited computing resources are available and only small G x E interactions are observed, CV/G seems sufficient to assess the prospects of MAS. Accounting for both factors simultaneously, CV/GE is indispensable for obtaining asymptotically unbiased estimates of p for traits with complex genetic architecture and relatively low heritability such as grain yield. Nevertheless, there are still a number of open questions concerning the use of resampling methods for estimation of QTL positions and unbiased estimation of their genetic effects. Further research is needed on the optimum choice of the factors k and u and how to determine the true position of a QTL from the replicated CV runs. As already discussed above, the factors k and u need to be chosen as a function of the population size and the number of environments in the DS. The frequency distributions of detected QTL in estimation appeared to be a good tool for data interpretation and definition of QTL positions, at least for the more highly heritable traits. In addition, CV should be compared with other resampling methods such as bootstrapping with respect to desired properties in QTL analysis.
Conclusions:
For MAS to be superior to classical phenotypic selection, QTL positions must be estimated with high precision, estimated QTL effects must reflect their true genetic effects, and a sufficient proportion of the genotypic variance of the trait under study must be explained by the detected QTL. Both cross validation and validation with independent samples revealed a large bias in p when estimated from the same data set that was also used for determining QTL positions by composite interval mapping. These results were also corroborated with data from a study on yield and plant height in barley and two studies on insect resistance in maize. In all three studies genotypic and environmental sampling had a significant effect on the bias of estimates of p. Evidence for a fairly poor precision of estimation of QTL effects was given by the large range of
ES and
TS.ES in all studies. The asymptotically unbiased estimate
TS.ES from CV/GE was <50% for all traits except PH in all studies, indicating that less than half of the genotypic variance could be explained by QTL, suggesting that quantitative traits are probably controlled by a large number of genes with fairly small effects.
By the construction of QTL frequency distributions we tested the precision of QTL localization. While for plant height the position of a QTL on chromosome 7 could be fairly well determined, the absence of a well-defined peak in the QTL frequency distribution of GY reflected the poor QTL fidelity in estimation. If localization of the QTL is fairly vague in ES, there is little hope for unbiased estimation of its true genetic effects in TS.
On the basis of these results, we recommend improving interpretation of QTL analyses by (1) using QTL frequency distributions for determining the position of a QTL and (2) using cross validation, accounting for environmental and genotypic sampling (CV/GE), to obtain unbiased estimates of the proportion of the genotypic variance explained by QTL and to draw realistic conclusions on the prospects of MAS.
Manuscript received April 27, 1999; Accepted for publication December 13, 1999.
| LITERATURE CITED |
|---|
BEAVIS, W. D., 1994 The power and deceit of QTL experiments: lessons from comparative QTL studies, pp. 250266 in 49th Annual Corn and Sorghum Industry Research Conference. ASTA, Washington, DC.
BEAVIS, W. D., 1998 QTL analyses: power, precision and accuracy, pp. 145162 in Molecular Dissection of Complex Traits, edited by A. H. PATERSON. CRC Press, Boca Raton, FL.
BOHN, M., M. M. KHAIRALLAH, D. GONZALEZ-DE-LEON, D. A. HOISINGTON, and H. F. UTZ et al., 1996 QTL mapping in tropical maize: I. Genomic regions affecting leaf feeding resistance to sugarcane borer and other traits. Crop Sci. 36:1352-1361
BREIMAN, L. and P. SPECTOR, 1992 Submodel selection and evaluation in regression. The X-random case. Int. Stat. Rev. 60:291-319.
CHARCOSSET, A. and A. GALLAIS, 1996 Estimation of the contribution of quantitative trait loci (QTL) to the variance of a quantitative trait by means of genetic markers. Theor. Appl. Genet. 93:1193-1201.
COWEN, N. M., 1988 The use of replicated progenies in marker-based mapping of QTL's. Theor. Appl. Genet. 75:857-862.
DAVISON, A. C., and D. V. HINKLEY, 1997 Bootstrap methods and their application. Cambridge University Press, Cambridge, United Kingdom.
DRAPER, N. R., and H. SMITH, 1981 Applied Regression Analysis, Ed. 2. Wiley & Sons, New York.
GEORGES, M., D. NIELSEN, M. MACKINNON, A. MISHRA, and R. OKIMOTO et al., 1995 Mapping quantitative trait loci controlling milk production in dairy cattle by exploiting progeny testing. Genetics 139:907-920[Abstract].
HALEY, C. S. and S. A. KNOTT, 1992 A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69:315-324[Medline].
HALLAUER, A. R., and J. B. MIRANDA, 1981 Quantitative Genetics in Maize Breeding. Iowa State University Press, Ames, IA.
HAYES, P. M., B. H. LIU, S. J. KNAPP, F. CHEN, and B. JONES et al., 1993 Quantitative trait locus effects and environmental interaction in a sample of North American barley germ plasm. Theor. Appl. Genet. 87:392-401.
HJORTH, J. S. U., 1994 Computer Intensive Statistical Methods. Validation Model Selection and Bootstrap. Chapman & Hall, London.
HOLLOWAY, J. L., and S. J. KNAPP, 1993 G-MENDEL 3.0 software for the analysis of genetic markers and maps. Oregon State University, Corvallis.
JANSEN, R. C., 1993 Interval mapping of multiple quantitative trait loci. Genetics 135:205-211[Abstract].
JANSEN, R. C. and P. STAM, 1994 High resolution of quantitative traits into multiple loci via interval mapping. Genetics 136:1447-1455[Abstract].
KENDALL, M. G., and A. STUART, 1961 The Advanced Theory of Statistics. Vol. 2. Inference and Relationship. Charles Griffin & Company, London.
KNAPP, S. J., 1998 Marker-assisted selection as a strategy for increasing the probability of selecting superior genotypes. Crop Sci. 38:1164-1174
LANDE, R. and R. THOMPSON, 1990 Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics 124:743-756[Abstract].
LANDER, E. S. and D. BOTSTEIN, 1989 Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121:185-199
LANDER, E. S., P. GREEN, J. ABRAHAMSON, A. BARLOW, and M. J. DALY et al., 1987 MAPMAKER: an interactive computer package for constructing primary genetic linkage maps of experimental and natural populations. Genomics 1:174-181[Medline].
LIU, B.-H., 1998 Computational tools for study of complex traits, pp. 4379 in Molecular Dissection of Complex Traits, edited by A. H. PATERSON. CRC Press, Boca Raton, FL.
LÜBBERSTEDT, T., A. E. MELCHINGER, C. C. SCHÖN, H. F. UTZ, and D. KLEIN, 1997 QTL mapping in testcrosses of European flint lines of maize: I. Comparison of different testers for forage yield traits. Crop Sci. 37:921-931.
MARTINEZ, O. and R. N. CURNOW, 1992 Estimating the locations and the sizes of the effects of quantitative trait loci using flanking markers. Theor. Appl. Genet. 85:480-488.
MELCHINGER, A. E., H. F. UTZ, and C. C. SCHÖN, 1998 QTL mapping using different testers and independent population samples in maize reveals low power of QTL detection and large bias in estimates of QTL effects. Genetics 149:383-403
MODE, C. J. and H. F. ROBINSON, 1959 Pleiotropism and the genetic variance and covariance. Biometrics 15:518-537.
MOREAU, L., A. CHARCOSSET, F. HOSPITAL, and A. GALLAIS, 1998 Marker-assisted selection efficiency in populations of finite size. Genetics 148:1353-1365
OPENSHAW, S., and E. FRASCAROLI, 1997 QTL detection and marker-assisted selection for complex traits in maize, pp. 4453 in 52nd Annual Corn and Sorghum Industry Research Conference, Edited by ASTA, Washington, DC.
PATTERSON, H. D. and E. R. WILLIAMS, 1976 A new class of resolvable incomplete block designs. Biometrika 63:83-92
SCHÖN, C. C., M. LEE, A. E. MELCHINGER, W. D. GUTHRIE, and W. L. WOODMAN, 1993 Mapping and characterization of quantitative trait loci affecting resistance against second-generation European corn borer in maize with the aid of RFLPs. Heredity 70:648-659.
SCHÖN, C. C., A. E. MELCHINGER, J. BOPPENMAIER, E. BRUNKLAUS-JUNG, and R. G. HERRMANN et al., 1994 RFLP mapping in maize: quantitative trait loci affecting testcross performance of elite European flint lines. Crop Sci. 34:378-389
SILLANPÄÄ, M. J. and E. ARJAS, 1998 Bayesian mapping of multiple quantitative trait loci from incomplete inbred line cross data. Genetics 148:1373-1388
TUKEY, J. W., 1977 Exploratory Data Analysis. Addison-Wesley Publishing Company, Reading, MA.
UTZ, H. F., and A. E. MELCHINGER, 1994 Comparison of different approaches to interval mapping of quantitative trait loci, pp. 195204 in Biometrics in Plant Breeding: Applications of Molecular Markers, edited by J. W. VAN OOIJEN and J. JANSEN. Proceedings of the Ninth Meeting of the EUCARPIA Section Biometrics in Plant Breeding, Wageningen, The Netherlands. 68 July 1994.
UTZ, H. F. and A. E. MELCHINGER, 1996 PLABQTL: a program for composite interval mapping of QTL. J. Quant. Trait Loci 2(1).
VAN OOIJEN, J. W., 1992 Accuracy of mapping quantitative trait loci in autogamous species. Theor. Appl. Genet. 84:803-811.
VISSCHER, P. M., R. THOMPSON, and C. S. HALEY, 1996 Confidence intervals in QTL mapping by bootstrapping. Genetics 143:1013-1020[Abstract].
ZENG, Z.-B., 1994 Precision mapping of quantitative trait loci. Genetics 136:1457-1468[Abstract].
This article has been cited by other articles:
![]() |
J. J. Wassom, V. Mikkelineni, M. O. Bohn, and T. R. Rocheford QTL for Fatty Acid Composition of Maize Kernel Oil in Illinois High Oil x B73 Backcross-Derived Lines Crop Sci., January 16, 2008; 48(1): 69 - 78. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. J. Wassom, J. C. Wong, E. Martinez, J. J. King, J. DeBaene, J. R. Hotchkiss, V. Mikkilineni, M. O. Bohn, and T. R. Rocheford QTL Associated with Maize Kernel Oil, Protein, and Starch Concentrations; Kernel Mass; and Grain Yield in Illinois High Oil x B73 Backcross-Derived Lines Crop Sci., January 16, 2008; 48(1): 243 - 252. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. E. Melchinger, H. F. Utz, H.-P. Piepho, Z.-B. Zeng, and C. C. Schon The Role of Epistasis in the Manifestation of Heterosis: A Systems-Oriented Approach Genetics, November 1, 2007; 177(3): 1815 - 1825. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. E. Melchinger, H.-P. Piepho, H. F. Utz, J. Muminovic, T. Wegenast, O. Torjek, T. Altmann, and B. Kusterer Genetic Basis of Heterosis for Growth-Related Traits in Arabidopsis Investigated by Testcross Progenies of Near-Isogenic Lines Reveals a Significant Role of Epistasis Genetics, November 1, 2007; 177(3): 1827 - 1837. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Kusterer, H.-P. Piepho, H. F. Utz, C. C. Schon, J. Muminovic, R. C. Meyer, T. Altmann, and A. E. Melchinger Heterosis for Biomass-Related Traits in Arabidopsis Investigated by Quantitative Trait Loci Analysis of the Triple Testcross Design With Recombinant Inbred Lines Genetics, November 1, 2007; 177(3): 1839 - 1850. [Abstract] [Full Text] [PDF] |
||||