Abstract
The efficiency of marker-assisted selection (MAS) depends on the power of quantitative trait locus (QTL) detection and unbiased estimation of QTL effects. Two independent samples (N = 344 and 107) of F2 plants were genotyped for 89 RFLP markers. For each sample, testcross (TC) progenies of the corresponding F3 lines with two testers were evaluated in four environments. QTL for grain yield and other agronomically important traits were mapped in both samples. QTL effects were estimated from the same data as used for detection and mapping of QTL (calibration) and, based on QTL positions from calibration, from the second, independent sample (validation). For all traits and both testers we detected a total of 107 QTL with N = 344, and 39 QTL with N = 107, of which only 20 were in common. Consistency of QTL effects across testers was in agreement with corresponding genotypic correlations between the two TC series. Most QTL displayed no significant QTL × environment nor epistatic interactions. Estimates of the proportion of the phenotypic and genetic variance explained by QTL were considerably reduced when derived from the independent validation sample as opposed to estimates from the calibration sample. We conclude that, unless QTL effects are estimated from an independent sample, they can be inflated, resulting in an overly optimistic assessment of the efficiency of MAS.
MOLECULAR marker technologies allow plant geneticists to construct high density genetic maps for any species of interest and use them for detecting, mapping, and estimating the effects of quantitative trait loci (QTL). While the basic idea of this approach was published more than 70 years ago (Sax 1923), new interest was generated when studies with maize and tomatoes successfully demonstrated that some markers explained a substantial proportion of the phenotypic variance of complex characters (for review, see Tanksley 1993). As a consequence, vigorous research on QTL mapping for quantitative traits such as yield, quality, maturity, and resistance to biotic and abiotic stress was initiated in many crop species (for review, see Lee 1995). Based on first results, it was anticipated that identification of important QTL regions could enhance plant breeding efficiency by marker-assisted selection (MAS). However, the prospects of this approach depend strongly upon the expenditures required for QTL mapping experiments, because their high costs reduce or even nullify the advantages of MAS schemes in a comprehensive economic assessment.
An important consideration in this context relates to the sample size (N) needed for QTL mapping. Most published experiments with replicated trials have employed between 100 and 200 progenies (for review, see Melchinger 1997), this choice being mainly dictated by the excessive labor and costs required for phenotyping and genotyping large populations. According to theoretical investigations (Lande and Thompson 1990), the proportion, p, of the additive genetic variance explained by the detected QTL is inversely related to the product, h2N, where h2 is the heritability of the trait. Consequently, for traits with moderate or low h2, where MAS should be most efficient, the chances of QTL detection with the above sample sizes are fairly low unless the QTL explains a substantial proportion of the genetic variance. A comparison of QTL detected in large versus small samples from the same population should give some insight into the power of QTL detection. So far the only experimental study that has been published on such a comparison is by Beavis (1994) on the highly heritable trait plant height using a limited data set of 20 markers only.
In view of the high costs of QTL studies, it has been common practice to estimate QTL effects from the same data as used for QTL mapping. With this approach, however, QTL effects generally are overestimated (Lande and Thompson 1990). As demonstrated by computer simulations (Beavis 1994; Utz and Melchinger 1994), the upward bias can be severe for low N and h2. To resolve this problem, Lande and Thompson (1990) suggested obtaining unbiased estimates of QTL effects by mapping QTL with one data set and based on this information estimating QTL effects in an independent data set. No experimental data have been presented so far on this approach even though knowledge about the magnitude of the bias of estimated QTL effects may strongly affect the conclusions concerning the prospects of MAS.
An indication that the bias of estimated QTL effects can be fairly large stems from the comparison of different QTL mapping studies. Beavis (1994) found little congruency of QTL locations and estimated effects for QTL on plant height and grain yield in different samples of progeny from the same cross (B73 × Mo17). However, as he pointed out, the comparison of results was confounded by a number of factors. Different sets of genetic markers were used in the studies and seed sources of parental lines were not the same. The progeny were evaluated in different environments and the level of inbreeding varied. Another confounding aspect was the evaluation of lines per se as opposed to testcross (TC) performance. Most QTL mapping studies have concentrated on line per se performance, even though in hybrid breeding it is essential to test new lines for their TC performance in combination with unrelated testers. Because testers and potential hybrid partners of new lines are often not fixed or may change over time, an important question concerns the consistency of QTL for TC performance with different testers.
In this study, we evaluated TC progenies of 344 F3 lines in combination with two unrelated testers plus additional TC progenies from an independent but smaller sample (N = 107) of F3 lines from the same cross in combination with the same two testers for grain yield and four other important agronomic traits. Objectives of our research were to (i) assess the magnitude of the bias of estimated QTL effects by mapping QTL with one data set (calibration) and, based on this information, estimate QTL effects in an independent data set (validation), (ii) compare the power of QTL detection in samples of different size, (iii) investigate the consistency of QTL across testers, and (iv) assess the importance of epistatic and QTL-by-environment interactions.
MATERIALS AND METHODS
Plant materials: The plant materials used for this study were partly identical to those employed and described in previous studies on kernel weight, protein concentration, plant height (Schönet al. 1994) and forage traits in maize (Lübberstedtet al. 1997). Briefly, two early maturing elite European flint inbreds, KW1265 and D146 (subsequently referred to as P1 and P2), were used as parents. Randomly chosen F2 plants from the cross P1 × P2 were selfed to produce 507 independently derived F3 lines. Subsequently, TC seed was produced in two separate isolation plots by mating each of two inbred testers as pollinators (KW4115 and KW5361, subsequently referred to as T1 and T2) to a random sample of 40 F3 plants from each of the 507 F3 lines as well as parents P1 and P2. Both testers were elite inbreds from two diverse European dent heterotic pools and unrelated by pedigree.
Field experiments: The TC progenies of F3 lines and parents P1 and P2 were evaluated in two series of experiments. Experiment 1 comprised two adjacent subexperiments each with 400 entries (Subexperiment 1T1 = TC with tester T1, Subexperiment 1T2 = TC with tester T2) conducted in 1990 and 1991 at two sites in Germany (Gondelsheim and Grucking) with diverse agroecological conditions and representing two main maize growing areas in Germany, the Upper Rhine valley and Lower Bavaria. Data on plant height were additionally available from forage trials conducted at five environments in Germany described in detail by Lübberstedt et al. (1997).
Experiment 2 also comprised two subexperiments, each with 150 entries (Subexperiment 2T1 = TC with tester T1, Subexperiment 2T2 = TC with tester T2) conducted in four environments. Two of the trials were grown adjacent to each other in the same environments (Eckartsweier 1993, Bad Krozingen 1993) and two environments were only used for one subexperiment (Subexperiment 2T1: Hochburg 1993, Zell 1993; Subexperiment 2T2: Eckartsweier 1992, Bad Krozingen 1992).
The 400 entries in Subexperiments 1T1 and 1T2 comprised 380 TC of F3 lines, TC of P1 and P2 included as quintuple entries, and 10 common check hybrids. The 150 entries in Subexperiments 2T1 and 2T2 comprised TC from a different set of 127 F3 lines, TC of P1 and P2 included as six and seven entries, respectively, and the same set of 10 check hybrids as in Experiment 1. The experimental design was a 40-by-10 alpha design (Patterson and Williams 1976) for Experiment 1 and a 15-by-10 alpha design for Experiment 2 with two replications each. Two-row plots were overplanted and later thinned to 45 plants per row in 1990, 50 plants per row in 1991, and 52 plants per row in 1992 and 1993 to reach a final stand of 10, 11, and 8.7 plants m−2, respectively. All experiments were machine planted and harvested as grain trials with a combine.
Data were collected for the following traits: grain yield (GY) in Mg ha−1, adjusted to 155 g kg−1 grain moisture, grain moisture (GM) in g kg−1 at harvest, kernel weight (KW) in mg kernel−1 determined from four samples of 50 kernels from each plot, protein concentration (PC) in grain (g kg−1) measured by near infrared reflectance spectroscopy as described by Melchinger et al. (1986), and plant height (PH) measured in cm on a plot basis as the distance from the soil level to the lowest tassel branch. In Experiment 1, PC could not be determined in 1991 at Grucking because of technical problems.
RFLP marker genotyping and linkage map construction: The procedures for RFLP assays, segregation analysis of individual markers, and construction of an RFLP linkage map for cross P1 × P2 were described in detail by Schön et al. (1994). A subset of 344 parental F2 plants of the 380 F3 lines employed in Experiment 1 and a second subset of 107 parental F2 plants of the 127 F3 lines employed in Experiment 2, all chosen for good DNA yield and showing no evidence of contamination, were genotyped for a total of 89 RFLP marker loci, 82 of them showing a codominant and seven a dominant inheritance pattern. Observed genotype frequencies at each marker locus were checked for deviations from Mendelian segregation ratios and allele frequency 0.5 by ordinary χ2 tests. Owing to multiple tests, appropriate type I error rates were determined by the sequentially rejective Bonferroni procedure (Holm 1979). A linkage map was constructed for the combined set of 451 F2 plants using MAPMAKER version 3.0b (Landeret al. 1987) and a LOD threshold of 3.0 in two-point analyses. Recombination frequencies between marker loci were estimated by multi-point analyses and transformed into map distances (cM) by Haldane's mapping function.
Data analyses: Each site-year combination was treated as an environment in the statistical analyses. First, analyses of variance were performed on the data from each subexperiment and environment. Adjusted entry means and effective error mean squares were then used to compute the combined analyses of variance and covariance across environments for each subexperiment. The sums of squares for entries (399 d.f. each in Subexperiments 1T1 and 1T2 and 149 d.f. each in Subexperiments 2T1 and 2T2) were subdivided into the variation among TC of F3 lines (379 d.f. in Subexperiments 1T1 and 1T2 and 126 d.f. in Subexperiments 2T1 and 2T2) and orthogonal contrasts among the TC means of P1, P2, F3 lines and the hybrid checks. A corresponding subdivision was conducted on the entry-by-environment interaction sums of squares.
Components of variance for the TC of F3 lines in each subexperiment were computed considering all effects (environments, F3 lines) in the statistical model as random. Estimates of variance components σ2 (error variance),
All QTL analyses were performed using the linkage information given in Figure 1. While Schön et al. (1994) used interval mapping according to Lander and Botstein (1989) for QTL analyses, in this study, the method of composite interval mapping (CIM) (Jansen and Stam 1994; Zeng 1994) was employed for mapping of QTL and estimation of their effects in each of the four subexperiments. All necessary computations were performed with PLABQTL (Utz and Melchinger 1996), which employs interval mapping by the regression approach (Haley and Knott 1992) in combination with the use of selected markers as cofactors. The underlying model for TC progenies with a given tester can be written as
Cofactors were selected by stepwise regression according to Miller (1990, p. 49) with an “F-to-enter” and an “F-to-delete” value of 3.5. Testing for presence of a putative QTL in an interval by a likelihood ratio (LR) test (yielding so-called LOD scores) was performed as described by Lübberstedt et al. (1997). We chose a LOD (=0.217 LR) threshold of 2.5 for declaring a putative QTL. Given that the LR test statistic follows in our analysis of TC progenies approximately a χ2 distribution with 2 df (1 df for the α-effect and 1 df for the position of the QTL; Zeng 1994), this approximates a comparisonwise type I error Pc < 0.0032 or a genomewise type I error Pg = MPc < 0.25 (M = 78 being the total number of intervals tested). Estimates of QTL positions were obtained at the point where the LOD score assumed its maximum in the region under consideration. Under CIM, computation of confidence intervals for the QTL position is still an unsolved problem (Visscheret al. 1996). Therefore, QTL detected with different testers or in different experiments were regarded as common if their estimated map position was within a 20-cM distance and the estimated α-effects had identical sign. Presence of QTL-by-environment (QTL × E) interactions and digenic epistatic interactions between the detected QTL were tested in combined analyses of variance across environments by F-tests described by Bohn et al. (1996). A detailed list of expected mean squares for the analysis of QTL experiments from multi-environments is given by Melchinger (1998).
The proportion of the phenotypic variance (
Two approaches were applied in calculating estimates of QTL effects: (1) Following common practice, QTL effects were estimated from exactly the same experiment as used for QTL detection; (2) QTL detection was performed in one experiment (subsequently denoted as calibration) and, based on this information, QTL effects were estimated from the data of the other experiment with the same tester (subsequently referred to as validation). In the latter case, the design matrix X in multiple regression was calculated on the basis of (a) the map position of the QTL detected in the calibration and (b) the marker genotype at the flanking markers of the F2 plants in the validation according to described procedures (Haley and Knott 1992; Utz and Melchinger 1996).
Finally, for each F2 genotype j the marker index score Mjz of its TC progeny with tester Tz was calculated from its marker genotype and the X matrix from the multiple regression in calibration as outlined by Lande and Thompson (1990). According to standard procedures (Mode and Robinson 1959), the Mjz values were subsequently used to estimate the genotypic correlation rg (Yjz′, Mjz) of the observed TC performance Yjz′ with tester Tz′ (z′ = 1, 2; z′ ≠ z) and the marker index score based on results with tester Tz. Standard errors of
RESULTS
Segregation and linkage of RFLP markers: In the data analysis of the combined set of 451 F2 plants (344 from Experiment 1 and 107 from Experiment 2), observed genotype frequencies were consistent with the expected Mendelian segregation ratios for all 89 RFLP markers assayed (data not shown). The 89 marker loci spanned a map distance of 1647 cM with an average interval length of 24cM (Figure 1). About 90% of the genome was located within a 20-cM distance to the nearest marker.
The RFLP map with 89 markers constructed from 451 F2 plants of maize cross P1 × P2.
Trait means, variances, heritabilities, and correlations: Climatic conditions were favorable for maize grain production in all 10 test environments. Means and phenotypic variances of the 10 check varieties included in each of the four subexperiments varied considerably between environments for all traits exhibiting rather diverse growing conditions. Average yield of the 10 checks ranged from 7.5 to 12.8 Mg ha−1. Phenotypic correlations based on performance of the 10 check varieties and averaged over traits and subexperiments were medium when calculated separately for the four environments of Experiment 1 (
In Experiment 1, TC means of F3 progenies with tester T1 were significantly (P < 0.01) smaller than with tester T2 for GY and KW but greater for GM (Table 1). For Experiment 2, the respective comparison is not meaningful because the two TC series were not evaluated in the same environments. The TC means of P1 and P2 differed significantly (P < 0.01) for all traits with both testers in both experiments. Parent P1 generally had higher TC means than P2 except for GY and GM (tester T2 in Experiment 2) (Tables 1 and 2). The orthogonal contrast between the average TC performance of the parent lines (
Genotypic variances among TC of F3 lines (
Phenotypic correlations between TC of F3 lines with tester T1 and T2 were greater than 0.53 except for GY (
Identification of QTL: Results from QTL analyses are presented for means across environments. For Experiments 1 and 2 estimates of the QTL position in the genome, the level of significance, the size of the phenotypic variance explained, the substitution effects and the significance of QTL-by-environment interactions are shown in Tables 3 and 4, respectively. The number of selected cofactors was higher in Experiment 1 (14–28) than in Experiment 2 (6–14) and more significant cofactors were found for traits with higher heritability than e.g., for GY. A complete list of the number of selected cofactors used for each trait, tester, and experiment can be obtained upon request from the corresponding author.
Comparison of QTL effects between experiments: Grain yield: For GY, seven putative QTL were identified in Experiment 1 in TC with T1 (Table 3). A simultaneous fit accounted for R2 = 30.8% of
There was one common QTL for GY between Experiment 1 and 2 (Table 5). For QTL positions identified in Experiment 1 (calibration),
Grain moisture: In Experiment 1, 12 and 13 QTL influencing GM in TC with tester T1 and T2, respectively, were detected. A simultaneous fit yielded R2 = 45.6% and
In Experiment 2, three QTL were found for GM in TC with T1 (R2 = 17.4% and
Two and six QTL were in common between Experiment 1 and 2 for tester T1 and T2, respectively (Table 5). Estimates of α-effects from validation in Experiment 2 were in four cases larger but otherwise much smaller than those from calibration in Experiment 1 (Table 3). If significant, both estimates of α had identical sign except for one QTL on chromosome 9 for tester T1 and one on chromosome 2 for T2. Collectively, the QTL effects from validation in Experiment 2 accounted for R2 = 24.3% and
Kernel weight: In Experiment 1, 12 QTL in TC with T1 and 11 QTL in TC with T2 were found for KW. The 12 QTL accounted for R2 = 63.7% and
In Experiment 2, four QTL were detected for TC with T1 and five QTL for TC with T2. Collectively, these QTL explained R2 = 41.5% and
First and second degree statistics for maize TC progenies from parent lines (P1 and P2) and 380 F3 lines (from cross P1 × P2) with inbred testers T1 and T2 in Experiment 1 for GY, GM, KW (four environments), PC (three environments), and PH (nine environments)
First and second degree statistics for maize TC progenies from parent lines (P1 and P2) and 127 F3 lines (from cross P1 × P2) with inbred testers T1 and T2 in Experiment 2 for five agronomic traits measured in four environments
QTL affecting GY, GM, KW, PC, and PH detected in maize TC progenies of F3 lines from cross P1 × P2 in Experiment 1 (calibration, N = 344) and respective QTL effects estimated in Experiment 2 (validation, N = 107)
QTL affecting GY, GM, KW, PC, and PH detected in maize TC progenies of F3 lines from cross P1 × P2 in Experiment 2 (calibration, N = 107) and respective QTL effects estimated in Experiment 1 (validation, N = 344)
Proportion ( ) of
explained by the detected QTL in maize TC progenies of F3 lines from cross P1 × P2. (A) Calibration in Experiment 1 (N = 344), validation in Experiment 2 (N = 107). (B) Calibration in Experiment 2, validation in Experiment 1.
Three QTL for T1 and two QTL for T2 were in common between Experiment 1 and 2 (Table 5), including the largest QTL explaining about 25.5% of
Protein concentration: In Experiment 1, nine and ten QTL influencing PC in TC with T1 and T2, respectively, were mapped. A simultaneous fit yielded R2 = 37.7% and
In Experiment 2, three QTL affected PC in TC with T1 (R2 = 31.6% and
Only one QTL was in common between Experiment 1 and 2 for each tester (Table 5). In several instances,
Plant height: In Experiment 1, 17 and 14 QTL affecting PH in TC with T1 and T2, respectively, were identified
on all 10 chromosomes (Table 3). A simultaneous fit with all QTL accounted for R2 = 63.2% and
Number of putative QTL detected and in commona (∩) for maize TC progenies of F3 lines with testers T1 and T2 in Experiments 1 and 2
In Experiment 2, four QTL were found in TC with T1. Collectively, these QTL explained R2 = 43.6% and
Three QTL for TC with T1 and one QTL for TC with T2 were in common between Experiment 1 and 2, including the largest QTL found for both testers on chromosome 1 (Table 5). Estimates of α-effects from validation in Experiment 2 were largely consistent in sign and magnitude with those from calibration in Experiment 1 (Table 3). Collectively, the former explained R2 = 55.7% and
Digenic epistasis between detected QTL: In Experiment 1, the test for digenic epistatic interactions (αα-effects) among detected QTL was significant (P < 0.05) in few instances. In TC with T1, we found epistasis only for GM between the QTL on chromosome 2 (position 180 cM) and chromosome 8 (position 106 cM) with
Correlation between predicted and observed TC performance: In Experiment 1, estimates of the genotypic correlation rg (Yjz′, Mjz) exceeded 0.70 for KW and PH, and ranged between 0.60 and 0.53 for GM and PC, but were below 0.39 for GY (Table 6). In Experiment 2,
DISCUSSION
Advantages of CIM: A comparison of our results in Experiment 1 for PC and KW with those of Schön et al. (1994) clearly demonstrates the advantages of CIM over simple interval mapping. Both investigations relied on the same data set and employed the same LOD threshold for QTL detection. For both traits, we found about twice the number of QTL and a much better agreement between testers than reported by Schön et al. (1994). This was due to the detection of additional QTL and a better resolution of linked QTL with CIM, as expected from theory and simulation results (Zeng 1994). The R2 values for the simultaneous fit were only marginally increased with CIM. However, this comparison is confounded with the difference in R2 estimates obtained by the regression and maximum likelihood approach (Xu 1995) implemented in software packages PLABQTL and MAPMAKER/QTL, respectively, employed in the two studies.
Estimation of QTL effects from independent samples: Estimates of individual QTL effects were in most cases considerably smaller when estimated from an independent validation experiment in lieu of the calibration experiment (Tables 3 and 4). In some cases effects of opposite sign were found in the validation experiment, suggesting the occurrence of a type III error (i.e., a significant association is correctly declared but the marker allele is associated with the wrong QTL allele; Dudley 1993). For all traits and both testers,
In our opinion, the decrease in
Genotypic correlation rg (Yjz′, Mjz) between observed performance (Yjz′) of maize TC progenies of F3 lines j with tester Tz′, and their prediction (Mjz) from QTL mapping results of TC progenies with tester Tz′ in Experiments 1 and 2
On the other hand, computer simulations (Beavis 1994; Utz and Melchinger 1994; Georgeset al. 1995) demonstrated that statistical sampling has a strong impact on QTL analyses and that the bias in QTL effects estimated from calibration can be severe, the most important factors being the sample size N, the magnitude of the QTL effect, and h2. Utz and Melchinger (1994) showed that with CIM the
The inflation in the R2 and
Suggestions in the statistical literature (for review, see Miller 1990) to diminish these problems include (1) model validation with an additional sample as proposed by Lande and Thompson (1990) or (2) cross-validation in the case of larger sample sizes. For validation we calculated the regressors without model selection based on QTL positions identified in the calibration and estimated the partial regression coefficients based on the a priori chosen model. Hence, the estimated regression coefficients are unbiased in the validation based on standard linear model theory. Beavis (1994) proposed the use of resampling strategies for reducing the bias by using results from experiments with multiple independent samples of progeny. Other resampling strategies such as bootstrapping may be a further alternative for eliminating the bias. However, when using CIM, it is not obvious from which pool to draw the bootstrap samples for estimating the QTL parameters (Visscheret al. 1996).
While the absolute proportion of
The lack of consistency between QTL effect estimates obtained from calibration and validation has several important consequences for QTL mapping and MAS for polygenic traits: (1) It demonstrates that, due to sampling and QTL × E interactions, individual QTL effects estimated directly from calibration can be inflated, especially for smaller values of N and complexly inherited traits such as GY. Inferences about the relative magnitude of QTL effects estimated from previous experimental studies should be reexamined under this aspect. (2) The distribution of estimated QTL effects may not reflect the distribution of true QTL effects. A large estimate may reflect either a large QTL or a small QTL estimated with a large bias. (3) The decision of which QTL regions to transfer with MAS and/or to consider in a selection index should be based on QTL effects verified in an independent validation sample. (4) For a correct assessment of the prospects of MAS, the key parameter p must not be determined from calibration, but from an independent validation sample or by using cross-validation.
Comparison of QTL detected in samples of different size: We evaluated the power of QTL detection by comparing results from QTL mapping in two independent samples of different size from the same population. The smaller sample size (N = 107) in Experiment 2 was chosen in accordance with (1) most experimental QTL studies reported in the literature and (2) the maximum number of progenies generally employed per cross for early testing in recycling breeding (Beaviset al. 1994). In Experiment 1 we chose, from a breeder's point of view, a large sample size (N = 344) to meet the minimum requirements for the detection of smaller QTL, as suggested by theory (Lander and Botstein 1989; Lande and Thompson 1990). From Equation 2 it can be shown that with a LOD threshold of 2.5 we were able to detect a QTL accounting for at least 10.2% of
Only about half (20) of the putative QTL detected in Experiment 2 were in common with QTL identified in Experiment 1 and the poorest agreement was observed for GY (Table 5). As pointed out earlier, the comparison of results between Experiments 1 and 2 is confounded by the different test environments and very likely both factors, sampling and QTL × E interactions, contributed to the lack of congruency of QTL found for the two experiments. In a comparison of QTL mapping results from two independent studies with elite cross B73 × Mo17, Beavis et al. (1994) found similar results for GY. These authors concluded that the lack of congruency was mainly attributable to sampling of progeny because the sample sizes used in their study were small. However, even with large sample sizes the statistical power of QTL detection is only moderate for QTL with smaller effects as demonstrated by various simulation studies (Van Ooijen 1992; Beavis 1994; Utz and Melchinger 1994). For example, the power for detecting a QTL explaining 3.5% of
This apparent gap of explanation can be closed by considering that many of the QTL effects estimated in Experiment 2 had a large upward bias, as discussed earlier. Assuming their true effects were often much smaller, it follows in combination with the previous argument that there was only a moderate chance of detecting them simultaneously in Experiment 1. In addition, we cannot rule out that a few of the putative QTL were either environment-specific or “false positives” and therefore occurred only in one experiment, given that a LOD threshold of 2.5 corresponds in our study to a genomewise Type I error rate Pg ≤ 0.25.
For testing congruency of QTL it was not possible to adopt a criterion based on overlapping confidence intervals, because with CIM their computation is still an unsolved problem (Visscheret al. 1996). We declared two QTL as being in common if they had the same sign and were within a 20-cM distance. Using a wider interval length (e.g., 40 cM) would have increased the proportion of common QTL only marginally but entailed a high risk that well-separated different QTL are declared as common, if they have by chance the same sign. This applies particularly for such traits as PH, where a large number of QTL was detected.
The “genetic architecture” of a trait characterized by the number of effective factors (Wright 1968) has an impact on both the power of QTL detection and the magnitude of the bias when estimating QTL effects. With a large number of minor QTL influencing a quantitative trait, the power of QTL detection and consequently the number of common QTL should be smaller than for a trait governed by a small number of major QTL. Likewise, the relative bias in R2 and
The lack of congruency among QTL detected in two samples from the same cross provides the baseline for comparisons of QTL detected in populations derived from different crosses. Therefore, it was not surprising that most QTL regions reported here were either unique or found in just one or two comparable studies in the literature. Only one QTL region adjacent to marker umc89 on chromosome 8 affecting both GY and GM was identified in several other investigations (Stuberet al. 1992; Beaviset al. 1994; Bohnet al. 1996). Likewise, the QTL region adjacent to umc140 on chromosome 9 was repeatedly shown to have a large effect on KW (Beaviset al. 1994; Austin and Lee 1996). Three of the seven QTL detected for PC in cross IHP × ILP (Goldmanet al. 1993) were also active in Experiments 1 and 2 with both testers.
In addition to the confounding factor of sampling, two features of our experimental materials could explain the singular set of QTL reported here: (1) Our mapping population was generated from a cross of two elite European flint lines, whereas all other QTL studies in maize have employed wide crosses between North American dent lines or tropical germplasm. Because flint and dent are fairly distinct germplasm groups, we hypothesize that they have only a small subset of polymorphic QTL in common. (2) In practical breeding programs, elite lines from the same heterotic group are crossed and early selfing generations (F2 plants or F3 lines) are evaluated for their TC performance in combination with testers from the opposite heterotic group. We therefore mapped QTL for TC performance as opposed to line per se performance commonly determined in most previous QTL studies. This can result in largely different sets of QTL as demonstrated in a comparison of both features in cross B73 × Mo17 (Beaviset al. 1994) and expected from theory (see appendix b).
Comparison among testers: According to theory (see appendix b), consistent QTL mapping results across testers are expected in the absence of epistasis if both testers have identical alleles at the QTL, or additive gene action prevails, or the dominance effects satisfy the condition
With the exception of GY, our QTL mapping results in Experiment 1 agreed well across testers for all traits: more than half of the QTL detected with one tester were also found with the other tester and the proportion of common QTL was in close agreement with the magnitude of
The absence of common QTL between both testers observed for GY can be explained by several causes, the most important being related to gene action. Studies on GY exhibited a high degree of dominance (Hallauer and Miranda 1981, Chapter 5) and a large proportion of QTL with dominance and overdominance (or pseudooverdominance) (Stuberet al. 1992; Beaviset al. 1994; Bohnet al. 1996; Cockerham and Zeng 1996). Under this supposition, inconsistent QTL results among testers can be explained by masking effects of the tester allele. If a QTL is detected for tester T1, but tester T2 carries an allele fully dominant over the alleles carried by P1 and P2, no QTL will be detected in its TC progenies. Epistasis between unlinked QTL and QTL × E interactions were presumably not important causes for the inconsistencies between testers, as they were generally of minor importance.
Given that the tester may change over time in a hybrid breeding program, we examined whether a marker index score Mjz based on QTL mapping results with tester Tz′ would be effective for improving TC performance Yjz′ with tester Tz′. For this purpose, we estimated the genotypic correlation rg (Yjz′, Mjz), which represents the key parameter in the formula for the selection response in Yj from indirect selection for Mj (Falconer and Mackay 1996). Our results showed for all traits except GY high enough estimates of rg so that for a given sample, QTL-marker associations determined for one tester would be effective for improving TC performance with other testers. For GY, however, separate QTL mapping experiments would be required for each tester. In Experiment 2,
Epistasis among QTL: The comparison of TC generation means for parents (
One reason for the absence of significant epistasis in our study could be that we investigated a genetically narrow cross between elite lines from the same germplasm group. In this case, there should be less opportunity to disrupt coadapted epistatic gene complexes in the parents as might be expected for wide or interspecific crosses oftentimes employed in QTL mapping studies. Furthermore, the power for detecting epistatic interactions among QTL is lower for TC performance than line per se performance due to masking effects of the tester (Gallais and Rives 1993).
In our analysis, those QTL with significant epistatic but insignificant main effects would remain undetected. A recent QTL study on grain yield components in rice identified a large number of QTL regions of this type (Liet al. 1997). However, the genome-wide search for epistatic effects among QTL employed by these authors is expected to aggravate the problems associated with model selection discussed earlier, because the number of regressor variables and multicolinearity among them increase tremendously. Thus, the need for validation with an independent sample is even more compelling for epistatic than for main effects of QTL.
QTL × environment interactions: In Experiment 1, about one third of the detected QTL displayed significant QTL × E interactions. The smallest fraction (one out of nine) was observed for GY, although estimates of
Most QTL studies reported in the literature (e.g., Stuberet al. 1992; Ragotet al. 1995; Cockerham and Zeng 1996), including ours, found rarely significant QTL × E interactions despite the presence of significant G × E interactions at the phenotypic level. The reasons for this apparent discrepancy are not clear but two possible explanations include: (1) the detected (major) QTL display smaller QTL × E interactions than the smaller undetected (minor) QTL (Tanksley 1993), and (2) the test procedure for detection of QTL × E interactions is less powerful than that for detection of G × E interactions. Our results did not support the first hypothesis because, among the QTL detected in Experiment 1, presence or absence of QTL × E interactions was not associated with the magnitude of
In general, varying the statistical analysis for CIM had only little impact on our findings. A comparison of our method used for QTL and QTL × E analysis with that of Jiang and Zeng (1995) applied to GY, GM and PH showed that a larger number of QTL × E interactions were detected with the latter approach, but results from both types of analyses hardly differed with respect to agreement of results between the two experiments. Likewise, only marginal deviations from original results were found when varying the strategy or threshold for cofactor selection in the QTL analysis. We infer from these findings that the particular choice of statistical analysis used for CIM has only little influence on the congruency between calibration and validation.
Conclusions: Identification of QTL affecting TC performance of agronomically important traits and accurate estimation of their genetic effects, including epistasis and QTL × E interactions, are essential requirements for application of MAS in hybrid breeding of maize. Here, we used independent samples of TC progenies from the same population to (1) assess the magnitude of the bias of estimated QTL effects and (2) compare the power of QTL detection in samples of different size.
Our results suggest that inferences drawn from QTL mapping studies about the efficiency of MAS should be verified in an independent validation sample. When QTL effects are estimated from the same data as used for detection and mapping of QTL positions, they can be inflated due to statistical sampling and G × E interactions. The relative magnitude of the bias can be substantial for sample sizes typically used in QTL mapping experiments (N < 200) especially for traits with moderate heritability and a complex genetic architecture such as grain yield. As a consequence, the key factor determining the efficiency of MAS in comparison with classical phenotypic selection, the proportion, p, of the genotypic variance explained by QTL-marker associations, is overestimated. Moreover, if in this study the magnitude of estimated QTL effects had been used as a criterion for the choice of important QTL regions to be transferred by MAS or to be considered in a selection index, selection response would have been smaller than expected, because QTL effects estimated from calibration were biased.
With currently available statistical methods it was not possible to separate the effects of statistical sampling and QTL × E interactions in this study, but we believe that at least the bias owing to sampling effects can be reduced by validation or cross-validation. For a correct assessment of the prospects of MAS as compared to classical phenotypic selection, more research efforts need to be dedicated to the analysis of the different factors leading to the inflation of QTL effect estimates.
The moderate agreement among the QTL detected in each sample provides evidence for a low power of QTL detection for most traits, especially GY. Only a small fraction of the detected QTL showed significant QTL × E interactions for all traits except GM, suggesting that field testing of experimental materials could be limited to few environments known to provide good differentiation.
The consistency of QTL mapping results across testers largely reflected the genotypic correlations among testers and the predominant type of gene action for each trait. Thus, for a given sample, selection response from MAS for TC performance of traits with mainly additive gene action should be comparable for TC progenies with the tester used in QTL mapping and TC progenies with other unrelated testers. For all traits, we found little evidence for digenic epistasis among the detected QTL, particularly when reexamined in an independent sample. On the contrary, differences in the TC performance of F3 lines with each tester were due to the presence or absence of common QTL. This suggests that nonepistatic gene effects are major determinants of general and specific combining ability in hybrid performance, as was also concluded from numerous classic quantitative-genetic experiments (Hallauer and Miranda 1981, Chapter 8).
Acknowledgments
The RFLP data of this study were produced in the lab of R. G. Herrmann, Ludwig-Maximilians-Universität, Munich, by E. Brunklaus-Jung and J. Boppenmaier. The contributions of D. Klein and W. Schmidt in conducting the field trials are gratefully acknowledged. We thank T. Lübberstedt, S. Groh and two anonymous reviewers for helpful suggestions and comments on the manuscript. The present study was part of EUREKA project 290. This research was supported by grants from the German Ministry of Research and Technology (BMFT) and KWS Kleinwanzlebener Saatzucht AG, grant 0319233A. This article is dedicated to F. W. Schnell on the occasion of his 85th birthday.
APPENDIX A: RELATIONSHIP BETWEEN R2, ℜ 2 , AND THE LOD SCORE
According to Haley and Knott (1992, p. 317) and Searle (1971, p. 125), the likelihood ratio test (LR) for presence of a putative QTL in composite interval mapping (Zeng 1994) can be written in terms of the residual sum of squares of the full model (fitting the cofactors plus the α-effect of the putative QTL in Equation 1) and the reduced model (fitting only the cofactors), and the number of observations (SSE, SSER, and N, respectively)
APPENDIX B: AVERAGE EFFECT OF ALLELE SUBSTITUTION IN TESTCROSSES
For a given QTL and tester Tz, the average effect of substituting the allele from parent P1 by the allele from parent P2 can be expressed as (Melchinger 1988)
Footnotes
-
Communicating editor: Z-B. Zeng
- Received August 7, 1997.
- Accepted February 2, 1998.
- Copyright © 1998 by the Genetics Society of America