Bias and Sampling Error of the Estimated Proportion of Genotypic Variance Explained by Quantitative Trait Loci Determined From Experimental Data in Maize Using Cross Validation and Validation With Independent Samples
 ^{*}Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, 70593 Stuttgart, Germany
 ^{†}State Plant Breeding Institute, University of Hohenheim, 70593 Stuttgart, Germany
 Corresponding author: Albrecht E. Melchinger, Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, 70593 Stuttgart, Germany. Email: melchinger{at}unihohenheim.de
Abstract
Cross validation (CV) was used to analyze the effects of different environments and different genotypic samples on estimates of the proportion of genotypic variance explained by QTL (p). Testcrosses of 344 F_{3} maize lines grown in four environments were evaluated for a number of agronomic traits. In each of 200 replicated CV runs, this data set was subdivided into an estimation set (ES) and various test sets (TS). ES were used to map QTL and estimate p for each run (
MOLECULAR markers are used by a great number of researchers to study quantitative traits of agronomic importance. The primary objective of these studies has been the identification of markers associated with quantitative trait loci (QTL) and their use in subsequent markerassisted selection (MAS) programs.
In the statistical analysis of quantitatively inherited traits, the introduction of QTL interval mapping and maximumlikelihood estimation of effects by Lander and Botstein (1989) was a landmark. Simplicity and speed of QTL analyses were further increased by using multiple regression for determining significance of putative QTL and estimation of their genetic effects (Haley and Knott 1992; Martinez and Curnow 1992). An increase in power of QTL detection as well as accuracy and precision of estimated QTL positions and effects can be accomplished by including additional markers as cofactors in the statistical model (Jansen 1993; Zeng 1994). For review of the various statistical methods for QTL analyses, see Liu (1998).
Identification of significant QTLmarker associations forms the baseline for MAS. To be superior to classical phenotypic selection, several prerequisites must be satisfied: (1) QTL positions are estimated with high precision to choose markers showing a minimum of recombination with the QTL and to resolve linked QTL; (2) estimated QTL effects reflect their true genetic effects and, therefore, are estimated without bias due to genotypic or environmental sampling; (3) a sufficient proportion of the genotypic variance of the trait under study is explained by the detected QTL.
With respect to these prerequisites, the available statistical methods still have considerable shortcomings. Using computer simulations it was shown that (a) estimates of individual QTL effects and the proportion of genotypic variance explained by QTL can be severely inflated, leading to an overly optimistic assessment of the prospects of MAS (Utz and Melchinger 1994; Georgeset al. 1995; Beavis 1998); and (b) confidence intervals for QTL positions are large for population sizes commonly used in QTL mapping experiments (van Ooijen 1992; Visscheret al. 1996).
In most experimental studies these limitations have been ignored even though with experimental data, the bias in QTL effects is expected to be even greater than in computer simulations that rely on simplifying assumptions. To overcome these pitfalls, Lande and Thompson (1990) suggested identifying QTLmarker associations in one data set and subsequently estimating genetic effects based on the a priori model in another, independent validation sample. However, owing to the high costs of QTL studies their suggestion has not become common practice.
In an earlier study, we demonstrated for experimental data in maize that the magnitude of QTL effects and the proportion of the phenotypic and genotypic variance explained by QTL decreased substantially when estimated in independent validation samples (Melchingeret al. 1998). Because the samples used for mapping of QTL positions were evaluated in different environments than those for estimation of genetic effects, the observed decrease in variance explained by QTL had to be attributed to two confounded factors: environmental and genotypic sampling. In basic studies as well as in practical breeding, however, the contribution of each factor to the bias must be known for an optimal allocation of limited resources.
Several authors (e.g., Beavis 1994; Visscheret al. 1996) recommended the use of resampling methods to determine the magnitude of bias caused by these factors and to get more realistic estimates of the amount of genotypic variance explained by QTL. In this study, we used cross validation (CV; Hjorth 1994) to elucidate the effects of environmental and genotypic sampling. Objectives of our research were to (1) obtain unbiased estimates of the proportion of the genotypic variance explained by all detected QTL (p); (2) analyze the influence of environmental and genotypic sampling on the magnitude of the bias and sampling error of estimates of p; and (3) compare the magnitude of the bias and sampling error of p determined by CV with results obtained with independent validation samples.
MATERIALS AND METHODS
Plant materials: The plant materials used for this study were partly identical to those described by Melchinger et al. (1998). Briefly, two early maturing elite European flint inbreds, KW1265 and D146 (subsequently referred to as P1 and P2), were used as parents. Randomly chosen F_{2} plants from the cross P1 × P2 were selfed to produce 507 independently derived F_{3} (F_{2:3}) lines. A subsample of these F_{3} lines was advanced by singleseed descent to the F_{4} generation to produce 71 independent F_{5} (F_{4:5}) lines. Testcross (TC) seed was produced in isolation plots by mating the unrelated inbred tester KW5361 (T2 in Melchingeret al. 1998) to a random sample of 40 plants from each of the 507 F_{3} lines, the 71 F_{5} lines, as well as parents P1 and P2.
Field experiments: The TC progenies were evaluated in three different experiments. Experiment 1 (Exp. 1) comprised 380 TC of F_{3} lines, TC of P1 and P2 included as quintuple entries, and 10 common check hybrids. Trials were conducted in 1990 and 1991 at two sites in Germany. Data on plant height were additionally available from forage trials conducted at five environments in Germany described in detail by Lübberstedt et al. (1997).
Experiment 2 (Exp. 2) comprised TC of an independent set of 127 F_{3} lines, TC of P1 and P2 included as six and seven entries, respectively, and the same set of 10 check hybrids as in Exp.1. Trials were grown in 1992 and 1993 at two sites in Germany.
Experiment 3 (Exp. 3) comprised TC of 71 F_{5} lines derived from 71 F_{3} lines, whose TC were grown in Exp. 2, TC of P1 and P2 included as multiple entries, and the same check hybrids as in Exp. 1 and Exp. 2. Exp. 3 was grown in 1992 at four sites in Germany, two of which were in common with Exp. 2, and one additional site in France.
In all three experiments the experimental design was a generalized lattice with two replications (Patterson and Williams 1976). Tworow plots were overplanted and later thinned to reach a final stand of 8–11 plants m^{−2} depending on the location. All experiments were machine planted and harvested as grain trials with a combine.
Data were analyzed for the following traits: grain yield (GY) in Mg ha^{−1}, adjusted to 155 g kg^{−1} grain moisture, grain moisture (GM) in g kg^{−1} at harvest, kernel weight (KW) in mg kernel^{−1} determined from four samples of 50 kernels from each plot, and plant height (PH) measured in centimeters on a plot basis as the distance from the soil level to the lowest tassel branch.
RFLP marker genotyping and linkage map construction: The procedures for RFLP assays were described by Schön et al. (1994). Two linkage maps were constructed based on 89 RFLP marker loci using a subset of 344 parental F_{2} plants of the 380 F_{3} lines employed in Exp. 1 and a subset of 107 parental F_{2} plants of the 127 F_{3} lines employed in Exp. 2, respectively. A third linkage map was obtained from the 71 parental F_{4} plants of the F_{5} lines tested in Exp. 3, genotyped with 84 of the 89 RFLP marker loci used for the other two linkage maps. Software packages MAPMAKER 3.0 (Landeret al. 1987) and GMENDEL 3.0 (Holloway and Knapp 1993) were used for map construction.
Agronomic data analyses: For each experiment adjusted entry means and effective error mean squares derived from analyses of variance of each siteyear combination were used to compute the combined analyses of variance across environments. For estimation of quantitative genetic parameters such as variance components and heritabilities, see Melchinger et al. (1998). Phenotypic (
QTL analyses: QTL mapping and estimation of their effects were performed with PLABQTL (Utz and Melchinger 1996) employing composite interval mapping (CIM) by the regression approach (Haley and Knott 1992) in combination with the use of cofactors (Jansen and Stam 1994; Zeng 1994). Following Cowen (1988), an additive genetic model is appropriate for the analysis of TC progenies, because TC progenies from F_{2} plants heterozygous at a given marker or QTL correspond to a 1:1 mixture of TC plants carrying alleles from P1 and P2. Accordingly, the underlying model for TC progenies can be written as
The selection of cofactors was described by Melchinger et al. (1998). Testing for presence of a putative QTL in an interval by a likelihoodratio test was performed using a 2.5 LOD threshold. Estimates of QTL positions were obtained at the position where the LOD score assumed its maximum in the region under consideration. Following Draper and Smith (1981), the proportion of the phenotypic variance explained by QTL was determined by the unbiased estimator
The proportion of the genotypic variance explained by all detected QTL was estimated from the ratio
Cross validation: One approach applied for evaluation of QTL mapping results was cross validation. Here, the entire data set (DS) is split into subsets. One or several subsets combined form the estimation set (ES) for QTL detection, localization, and estimation of genetic effects. The remaining subset(s) form the test set (TS) in which predictions derived from ES are tested for their validity by correlating predicted and observed data. For example, in fivefold CV, the DS comprising marker data from 344 F_{2} plants and phenotypic data of their TC progenies from Exp. 1 was randomly subdivided into k = 5 genotypic subsamples, 4 with 69, 1 with 68 genotypes (Figure 1). Each of the 5 genotypic subsamples was evaluated in four environments, and consequently DS was divided into 20 disconnected subsets. For testing the effect of (i) environmental sampling (CV/E), (ii) genotypic sampling (CV/G), and (iii) both factors simultaneously (CV/GE), the same ES but different TS were used. The ES consisted of four genotypic subsamples with phenotypic data from three of the four environments. In CV/E, the TS consisted of the same four genotypic subsamples as in ES, but phenotypic data came from the fourth, disconnected environment. In CV/G, the fifth disconnected genotypic subsample with phenotypic data from the same three environments as in ES was used as TS. In CV/GE, estimates of QTL effects were obtained by using the one subset not connected with ES, either by environment or by genotypes, as TS. Consequently, by permutating the respective k subsets used for ES and TS, 4 × k = 20 different CV runs are possible for fivefold cross validation. To increase the precision of estimates of p, additional CV runs were generated by using 10 different randomizations for assigning genotypes to the respective subsamples, yielding a total of 10 × 4 × k = 200 replicated CV runs.
The effect of sample size in ES and TS was tested by varying the number of genotypic and the number of environmental subsamples used for estimation and testing. Genotypic subsampling was tested in five different CV schemes, dividing DS into k (k = 2, 3, 5, 9, and 16) genotypic subsamples containing N/k TC progenies each (Table 1). An additional cross validation scheme (CV_{k=1}) was created by randomly subdividing DS into subsamples of size N_{ES} = 100 and N_{TS} = 244. For all CV schemes, estimates of p were obtained as the median from a minimum of 200 CV runs originating from an appropriate number of different randomizations (Table 1).
For plant height, which was evaluated in nine environments, additional CV schemes were analyzed by varying the number of environments included in ES (u = 1, 2, 3, 4, and 8) for k = 5 (Table 1). The corresponding TS were based on TC progeny means averaged across the remaining e = 9 − u disconnected environments. The number of possible CV runs obtained by a single randomization was 5
The magnitude of bias in estimates of genotypic variance explained by QTL due to genotypic and/or environmental sampling was obtained by comparing estimates of p obtained from the ES and TS. Based on QTL mapping results obtained with composite interval mapping (CIM) from the ES, the genotypic value of F_{3} line j in TS Q_{TS.ESj} can be predicted according to
The proportion of the genotypic variance explained by QTL in TS (
Using a LOD threshold of 2.5 each CV run yielded different estimates for the number of QTL, their location, and genetic effects in ES. Estimates of p in ES and TS were calculated as the median
A more detailed analysis was performed for putative QTL for GY and PH on chromosome 7. Precision of QTL localization was assessed by determining the relative frequency of detected QTL for 1376 replicated runs in 1cM intervals along chromosome 7 from ES with k = 5, u = 3, and e = 1. In ES, allele substitution effects (
Validation with independent samples: Statistical theory and procedures for the alternative approach of testing results obtained in ES with independent validation samples are equivalent to cross validation. The same estimation sets as in CV were used to predict genotypic values Q_{VS.ES} for validation sets (VS). The adjusted squared correlation coefficient (
RESULTS
Trait means, variances, and heritabilities: Quantitative genetic parameters for Exp. 1 and Exp. 2 were presented in detail by Melchinger et al. (1998). Genotypic variances among TC of F_{3} and F_{5} lines were significant for all traits in all three experiments. As anticipated from theory, genotypic variances among F_{5} lines in Exp. 3 were greater than those among F_{3} lines in Exp. 1 and Exp. 2 for all traits. Estimates of
QTL analyses: For detailed results from QTL analyses based on the entire DS (Exp. 1) see Melchinger et al. (1998). Briefly, the number of detected QTL in DS was 2 for GY, 13 for GM, 11 for KW, and 12 for PH, explaining between 28.7 and 65.5% of the genotypic variance (Figure 2).
For all traits but GY, the average number of QTL detected increased with increasing sample size N in ES and was almost twice as large for k = 16 (N = 323) as compared to k = 1 (N = 100; Figure 2). The median
Validation with the two independent samples VS1 and VS2 resulted in the lowest
For GM, KW, and PH, 95% confidence intervals for
When varying the number of environments in ES (u) and TS (e) with k = 5 for PH, on average 7–14 QTL were detected in ES and
Results of a more detailed analysis (1376 replicated runs) of one QTL for GY and two QTL for PH on chromosome 7 are presented in Figure 5. In the 1376
ES for k = 5, significant QTL for GY were detected at almost every position along the chromosome (Figure 5, bottom left). At position 75 cM the maximum of the distribution of relative QTL frequencies was reached (7.5%) but the distribution did not show a welldefined peak. The median allele substitution effects estimated from ES (
DISCUSSION
Resampling methods: All statistical methods used for QTL analysis share the problem of model selection because the true number and position of QTL and, hence, the correct statistical model estimating their genetic effects, are unknown. With CIM, the general procedure is to identify among a large number of regressor variables x_{i} (coded marker genotypes or functions of them) those that account for the largest proportion in the variance of the response variable Y (phenotypic values), and use them for estimation of QTL effects and p. With a limited sample size, model selection leads to an overestimation of QTL effects and p due to sampling effects and consequently to a biased assessment of the prospects of MAS. In this experimental study, we tried to quantify the prediction error of our QTL models and to obtain unbiased estimates of the proportion of genetic variance explained by the detected QTL using resampling methods. CV was preferred over bootstrapping for two reasons: (1) CV/GE provides asymptotically unbiased estimates of p because the data in TS used for testing the prediction are stochastically independent from the data in ES from which the prediction rule is inferred (Davison and Hinkley 1997); (2) CV allows us to evaluate the effects of both genotypic and environmental sampling on estimates of p individually and simultaneously.
Cross validation: In the five CV and validation schemes,
Charcosset and Gallais (1996) postulated that the adjusted
Choice of k in CV: When using CV, the value k for subdivision of the original DS is crucial for determining the bias of
Choice of u in CV: For the highly heritable trait PH, u = 3 seemed sufficient to obtain an almost perfect agreement between
Independent validation: As expected,
CV with data from the literature: To examine whether our conclusions concerning the magnitude of bias in
Number of QTL and size of effects: The current knowledge about the efficiency of MAS has mainly been inferred from computer simulations (for review see Moreauet al. 1998). These investigations generally assumed that the quantitative trait under study is controlled by relatively few (≤10) genes of large effects that lead to a Gaussian normal distribution. These assumptions were supported by the results of numerous experimental studies, where QTL with large genotypic effects on quantitative traits were detected with small population sizes and few test environments (for review see Beavis 1998). However, recently published results from QTL analyses using large populations raise doubts on the validity of the assumptions of few QTL with large effects segregating for complex traits with agronomic importance. In a study with 976 maize testcross progenies evaluated in 19 environments, Openshaw and Frascaroli (1997) found 28 and 36 QTL for GY and PH. Despite this large number of QTL, they explained only 54 and 60% of the genotypic variance, respectively. Cross validation in our study corroborated these findings, with
Presumably due to the negligible estimation bias with the large population size used and in accordance with the large number of QTL detected, Openshaw and Frascaroli (1997) did not find QTL with large effects. Hence, it seems legitimate to question the validity of results from QTL mapping studies with small population sizes. They run a high risk of overestimating genetic effects of QTL and p and, therefore, draw overly optimistic conclusions about the prospects of MAS. Consequently, if the expression of a quantitative trait is under the control of a large (>30) number of QTL with small effects it will be quite a challenge for breeders to combine them in one genotype by MAS and for molecular biologists to localize them precisely in the genome and clone them.
Precision of QTL localization: An additional assumption of simulation studies on MAS is that linkage between markers used for selection and the QTL is tight (Knapp 1998; Moreauet al. 1998). On the contrary, several researchers showed that precision of QTL localization is mostly poor (Visscheret al. 1996). Sillanpää and Arjas (1998) suggested the use of QTL intensity distributions for identification and detailed analysis of genomic regions with putative QTL. To obtain an idea of the position of a QTL in different ES, we adopted the concept of QTL frequency distributions for cross validation (k = 5). As is obvious from Figure 5, the two QTL frequency distributions for GY and PH on chromosome 7 were in good agreement with LOD curves obtained with CIM in DS. For PH, but not for GY, the QTL frequency distribution yielded clear peaks and, hence, it was possible to identify the most likely position for the QTL on chromosome 7. The lack of a welldefined peak in the QTL frequency distribution of GY reflects the poor QTL fidelity in (cross) validation. If localization of the QTL is fairly vague in ES, there is little hope for unbiased estimation of the true genetic effects in TS.
Recommendations: From our experience with these experimental data, we recommend using all three CV schemes (CV/E, CV/G, and CV/GE) to evaluate the influence of environmental and genotypic sampling on the magnitude of the bias of estimates of p. With CIM based on multiple regression, CV should be computationally feasible on standard personal computers. In addition, for CV only little extra experimental expenditures are required in contrast to independent validation. If only limited computing resources are available and only small G × E interactions are observed, CV/G seems sufficient to assess the prospects of MAS. Accounting for both factors simultaneously, CV/GE is indispensable for obtaining asymptotically unbiased estimates of p for traits with complex genetic architecture and relatively low heritability such as grain yield. Nevertheless, there are still a number of open questions concerning the use of resampling methods for estimation of QTL positions and unbiased estimation of their genetic effects. Further research is needed on the optimum choice of the factors k and u and how to determine the true position of a QTL from the replicated CV runs. As already discussed above, the factors k and u need to be chosen as a function of the population size and the number of environments in the DS. The frequency distributions of detected QTL in estimation appeared to be a good tool for data interpretation and definition of QTL positions, at least for the more highly heritable traits. In addition, CV should be compared with other resampling methods such as bootstrapping with respect to desired properties in QTL analysis.
Conclusions: For MAS to be superior to classical phenotypic selection, QTL positions must be estimated with high precision, estimated QTL effects must reflect their true genetic effects, and a sufficient proportion of the genotypic variance of the trait under study must be explained by the detected QTL. Both cross validation and validation with independent samples revealed a large bias in p when estimated from the same data set that was also used for determining QTL positions by composite interval mapping. These results were also corroborated with data from a study on yield and plant height in barley and two studies on insect resistance in maize. In all three studies genotypic and environmental sampling had a significant effect on the bias of estimates of p. Evidence for a fairly poor precision of estimation of QTL effects was given by the large range of
By the construction of QTL frequency distributions we tested the precision of QTL localization. While for plant height the position of a QTL on chromosome 7 could be fairly well determined, the absence of a welldefined peak in the QTL frequency distribution of GY reflected the poor QTL fidelity in estimation. If localization of the QTL is fairly vague in ES, there is little hope for unbiased estimation of its true genetic effects in TS.
On the basis of these results, we recommend improving interpretation of QTL analyses by (1) using QTL frequency distributions for determining the position of a QTL and (2) using cross validation, accounting for environmental and genotypic sampling (CV/GE), to obtain unbiased estimates of the proportion of the genotypic variance explained by QTL and to draw realistic conclusions on the prospects of MAS.
Footnotes

Communicating editor: ZB. Zeng
 Received April 27, 1999.
 Accepted December 13, 1999.
 Copyright © 2000 by the Genetics Society of America