- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Melchinger, A. E.
- Articles by Schön, C. C.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Melchinger, A. E.
- Articles by Schön, C. C.
Quantitative Trait Locus (QTL) Mapping Using Different Testers and Independent Population Samples in Maize Reveals Low Power of QTL Detection and Large Bias in Estimates of QTL Effects
Albrecht E. Melchingera, H. Friedrich Utza, and Chris C. Schönba Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, 70593 Stuttgart, Germany
b State Plant Breeding Institute, University of Hohenheim, 70593 Stuttgart, Germany
Corresponding author: Albrecht E. Melchinger, Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, 70593 Stuttgart, Germany, melchinger{at}uni-hohenheim.de (E-mail).
Communicating editor: Z-B. ZENG
| ABSTRACT |
|---|
The efficiency of marker-assisted selection (MAS) depends on the power of quantitative trait locus (QTL) detection and unbiased estimation of QTL effects. Two independent samples (N = 344 and 107) of F2 plants were genotyped for 89 RFLP markers. For each sample, testcross (TC) progenies of the corresponding F3 lines with two testers were evaluated in four environments. QTL for grain yield and other agronomically important traits were mapped in both samples. QTL effects were estimated from the same data as used for detection and mapping of QTL (calibration) and, based on QTL positions from calibration, from the second, independent sample (validation). For all traits and both testers we detected a total of 107 QTL with N = 344, and 39 QTL with N = 107, of which only 20 were in common. Consistency of QTL effects across testers was in agreement with corresponding genotypic correlations between the two TC series. Most QTL displayed no significant QTL x environment nor epistatic interactions. Estimates of the proportion of the phenotypic and genetic variance explained by QTL were considerably reduced when derived from the independent validation sample as opposed to estimates from the calibration sample. We conclude that, unless QTL effects are estimated from an independent sample, they can be inflated, resulting in an overly optimistic assessment of the efficiency of MAS.
MOLECULAR marker technologies allow plant geneticists to construct high density genetic maps for any species of interest and use them for detecting, mapping, and estimating the effects of quantitative trait loci (QTL). While the basic idea of this approach was published more than 70 years ago (![]()
![]()
![]()
An important consideration in this context relates to the sample size (N) needed for QTL mapping. Most published experiments with replicated trials have employed between 100 and 200 progenies (for review, see MELCHINGER 1997), this choice being mainly dictated by the excessive labor and costs required for phenotyping and genotyping large populations. According to theoretical investigations (![]()
![]()
In view of the high costs of QTL studies, it has been common practice to estimate QTL effects from the same data as used for QTL mapping. With this approach, however, QTL effects generally are overestimated (![]()
![]()
![]()
![]()
An indication that the bias of estimated QTL effects can be fairly large stems from the comparison of different QTL mapping studies. ![]()
In this study, we evaluated TC progenies of 344 F3 lines in combination with two unrelated testers plus additional TC progenies from an independent but smaller sample (N = 107) of F3 lines from the same cross in combination with the same two testers for grain yield and four other important agronomic traits. Objectives of our research were to (i) assess the magnitude of the bias of estimated QTL effects by mapping QTL with one data set (calibration) and, based on this information, estimate QTL effects in an independent data set (validation), (ii) compare the power of QTL detection in samples of different size, (iii) investigate the consistency of QTL across testers, and (iv) assess the importance of epistatic and QTL-by-environment interactions.
| MATERIALS AND METHODS |
|---|
Plant materials:
The plant materials used for this study were partly identical to those employed and described in previous studies on kernel weight, protein concentration, plant height (![]()
![]()
Field experiments:
The TC progenies of F3 lines and parents P1 and P2 were evaluated in two series of experiments. Experiment 1 comprised two adjacent subexperiments each with 400 entries (Subexperiment 1T1 = TC with tester T1, Subexperiment 1T2 = TC with tester T2) conducted in 1990 and 1991 at two sites in Germany (Gondelsheim and Grucking) with diverse agroecological conditions and representing two main maize growing areas in Germany, the Upper Rhine valley and Lower Bavaria. Data on plant height were additionally available from forage trials conducted at five environments in Germany described in detail by ![]()
Experiment 2 also comprised two subexperiments, each with 150 entries (Subexperiment 2T1 = TC with tester T1, Subexperiment 2T2 = TC with tester T2) conducted in four environments. Two of the trials were grown adjacent to each other in the same environments (Eckartsweier 1993, Bad Krozingen 1993) and two environments were only used for one subexperiment (Subexperiment 2T1: Hochburg 1993, Zell 1993; Subexperiment 2T2: Eckartsweier 1992, Bad Krozingen 1992).
The 400 entries in Subexperiments 1T1 and 1T2 comprised 380 TC of F3 lines, TC of P1 and P2 included as quintuple entries, and 10 common check hybrids. The 150 entries in Subexperiments 2T1 and 2T2 comprised TC from a different set of 127 F3 lines, TC of P1 and P2 included as six and seven entries, respectively, and the same set of 10 check hybrids as in Experiment 1. The experimental design was a 40-by-10 alpha design (![]()
Data were collected for the following traits: grain yield (GY) in Mg ha-1, adjusted to 155 g kg-1 grain moisture, grain moisture (GM) in g kg-1 at harvest, kernel weight (KW) in mg kernel-1 determined from four samples of 50 kernels from each plot, protein concentration (PC) in grain (g kg-1) measured by near infrared reflectance spectroscopy as described by ![]()
RFLP marker genotyping and linkage map construction:
The procedures for RFLP assays, segregation analysis of individual markers, and construction of an RFLP linkage map for cross P1 x P2 were described in detail by ![]()
2 tests. Owing to multiple tests, appropriate type I error rates were determined by the sequentially rejective Bonferroni procedure (![]()
![]()
Data analyses:
Each site-year combination was treated as an environment in the statistical analyses. First, analyses of variance were performed on the data from each subexperiment and environment. Adjusted entry means and effective error mean squares were then used to compute the combined analyses of variance and covariance across environments for each subexperiment. The sums of squares for entries (399 d.f. each in Subexperiments 1T1 and 1T2 and 149 d.f. each in Subexperiments 2T1 and 2T2) were subdivided into the variation among TC of F3 lines (379 d.f. in Subexperiments 1T1 and 1T2 and 126 d.f. in Subexperiments 2T1 and 2T2) and orthogonal contrasts among the TC means of P1, P2, F3 lines and the hybrid checks. A corresponding subdivision was conducted on the entry-by-environment interaction sums of squares.
Components of variance for the TC of F3 lines in each subexperiment were computed considering all effects (environments, F3 lines) in the statistical model as random. Estimates of variance components
2 (error variance),
2ge (genotype-by-environment (G x E) interaction variance), and
2g (genotypic variance) of F3 TC progenies and their standard errors (SE) were calculated as described by ![]()
![]()
2 were calculated according to
2g between (1) the two TC series in each experiment and (2) the two experiments for each tester according to the approximation given by
p) and genotypic (
g) correlations were calculated between TC of F3 lines with T1 and T2 for each trait in Experiments 1 and 2 by standard procedures (
All QTL analyses were performed using the linkage information given in Figure 1. While ![]()
![]()
![]()
![]()
![]()
![]()
![]() |
(1) |
|
Here, Yjz denotes the mean phenotypic trait value of the TC progeny of line j with tester z (z = 1, 2) averaged across all environments; µP1 is the mean phenotypic trait value of TC progeny carrying the allele from P1 at the QTL,
l is the average effect of substituting allele q in P1 by allele Q in P2 at the putative QTL in the marker interval (l, l + 1) under consideration; x*jl is the conditional expectation of the dummy variable
l given the observed genotypes at the flanking marker loci, where
l assumes values 0, 0.5 or 1, if the genotype of the F2 plant at the putative QTL is qq, Qq, or QQ, respectively; bk is the partial regression coefficient of phenotype Yjz on the kth (selected) marker; xjk is a dummy variable (cofactor) taking values 0, 0.5, or 1 depending on whether the marker genotype of the parental F2 individual j at marker locus k is homozygous P1, heterozygous, or homozygous P2, respectively;
jz is a residual variable for the TC progeny of the jth F3 line with tester z.
Cofactors were selected by stepwise regression according to ![]()
![]()
2 distribution with 2 df (1 df for the
-effect and 1 df for the position of the QTL; ![]()
![]()
-effects had identical sign. Presence of QTL-by-environment (QTL x E) interactions and digenic epistatic interactions between the detected QTL were tested in combined analyses of variance across environments by F-tests described by ![]()
![]()
The proportion of the phenotypic variance (
2p ) explained by a single QTL was determined as the square of the partial correlation coefficient (
2). Estimates of the allele substitution (
l) effect of each putative QTL, the total LOD score as well as the total proportion (R2) of
2p explained were obtained by fitting a model including all QTL for the respective trait simultaneously. This model was also used to estimate p, the proportion of the genotypic variance (
2g ) explained by all detected QTL, according to the procedures described by ![]()
Two approaches were applied in calculating estimates of QTL effects: (1) Following common practice, QTL effects were estimated from exactly the same experiment as used for QTL detection; (2) QTL detection was performed in one experiment (subsequently denoted as calibration) and, based on this information, QTL effects were estimated from the data of the other experiment with the same tester (subsequently referred to as validation). In the latter case, the design matrix X in multiple regression was calculated on the basis of (a) the map position of the QTL detected in the calibration and (b) the marker genotype at the flanking markers of the F2 plants in the validation according to described procedures (![]()
![]()
Finally, for each F2 genotype j the marker index score Mjz of its TC progeny with tester Tz was calculated from its marker genotype and the X matrix from the multiple regression in calibration as outlined by ![]()
![]()
z) and the marker index score based on results with tester Tz. Standard errors of r^g (Yjz', Mjz) were determined according to ![]()
| RESULTS |
|---|
Segregation and linkage of RFLP markers:
In the data analysis of the combined set of 451 F2 plants (344 from Experiment 1 and 107 from Experiment 2), observed genotype frequencies were consistent with the expected Mendelian segregation ratios for all 89 RFLP markers assayed (data not shown). The 89 marker loci spanned a map distance of 1647 cM with an average interval length of 24 cM (Figure 1). About 90% of the genome was located within a 20-cM distance to the nearest marker.
Trait means, variances, heritabilities, and correlations:
Climatic conditions were favorable for maize grain production in all 10 test environments. Means and phenotypic variances of the 10 check varieties included in each of the four subexperiments varied considerably between environments for all traits exhibiting rather diverse growing conditions. Average yield of the 10 checks ranged from 7.5 to 12.8 Mg ha-1. Phenotypic correlations based on performance of the 10 check varieties and averaged over traits and subexperiments were medium when calculated separately for the four environments of Experiment 1 (
p = 0.65) and also for the six environments of Experiment 2 (
p = 0.79). Using performance of check varieties in environments of Experiment 1 and correlating it with their performance in environments of Experiment 2 resulted in slightly lower phenotypic correlations (
p = 0.53).
In Experiment 1, TC means of F3 progenies with tester T1 were significantly (P < 0.01) smaller than with tester T2 for GY and KW but greater for GM (Table 1). For Experiment 2, the respective comparison is not meaningful because the two TC series were not evaluated in the same environments. The TC means of P1 and P2 differed significantly (P < 0.01) for all traits with both testers in both experiments. Parent P1 generally had higher TC means than P2 except for GY and GM (tester T2 in Experiment 2) (Table 1 and Table 2). The orthogonal contrast between the average TC performance of the parent lines (
) and the TC mean of the F3 lines (
3 ) was significant (P < 0.05) only for KW in Experiment 1 and PH in both experiments. The range in TC performance of F3 lines considerably transgressed the TC means of the parents for all traits but KW.
|
|
Genotypic variances among TC of F3 lines (
2g ) were highly significant (P < 0.01) for all traits with both testers in both experiments (Table 1 and Table 2). Estimates of
2g for TC with T1 and T2 were heterogeneous (P < 0.01) only for GM in Experiment 1. Estimates of
2ge were significantly greater than zero (P < 0.01) except in Experiment 2 for GY and PH (both testers) and GM (tester T1). Estimates of
2g and
2ge for both testers were significantly (P < 0.01) greater in Experiment 1 than Experiment 2 for GY and GM. Heritability was medium for GY (0.48 <
2 < 0.74) but relatively high for the other traits (0.64 <
2 < 0.91) with similar estimates for both testers and mostly overlapping confidence intervals for the two experiments (Table 1 and Table 2).
Phenotypic correlations between TC of F3 lines with tester T1 and T2 were greater than 0.53 except for GY (
p < 0.39) yet highly significant (P < 0.01) for all traits in both experiments (Table 1 and Table 2). Genotypic correlations (
g) varied between 0.60 and 0.88 and were in good agreement between both experiments for all traits.
Identification of QTL:
Results from QTL analyses are presented for means across environments. For Experiments 1 and 2 estimates of the QTL position in the genome, the level of significance, the size of the phenotypic variance explained, the substitution effects and the significance of QTL-by-environment interactions are shown in Table 3 and Table 4, respectively. The number of selected cofactors was higher in Experiment 1 (1428) than in Experiment 2 (614) and more significant cofactors were found for traits with higher heritability than e.g., for GY. A complete list of the number of selected cofactors used for each trait, tester, and experiment can be obtained upon request from the corresponding author.
|
|
Comparison of QTL effects between experiments:
Grain yield:
For GY, seven putative QTL were identified in Experiment 1 in TC with T1 (Table 3). A simultaneous fit accounted for R2 = 30.8% of
2p and
= 51.5% of
2g (Figure 2). Only two QTL were detected in TC with T2. Collectively, they accounted for R2 = 14.8% and
= 30.6%. In Experiment 2, one QTL on chromosome 3 explaining 4.1% of
2p and 4.0% of
2g was detected in TC with T1 (Table 4 and Figure 2). In contrast, four QTL were found in TC with T2. They explained collectively R2 = 32.1% and
= 40.6%. In each experiment, none of the QTL were in common between testers or displayed significant (P < 0.05) QTL x E interactions.
|
There was one common QTL for GY between Experiment 1 and 2 (Table 5). For QTL positions identified in Experiment 1 (calibration),
-effects estimated from Experiment 2 (validation) were on average about half as large yet of the same sign as those obtained from calibration (Table 3). An exception was the QTL on chromosome 1 with similar
-effects of opposite sign in calibration and validation. Collectively, the QTL effects from validation accounted for R2 = 12.2% and
= 7.6% for T1 and R2 = 3.8% and
= 5.1% for T2. When calibration was performed with Experiment 2 and validation with Experiment 1, the estimates dropped to R2 = 0.7% and
= 0.7% for TC with T1 and R2 = 11.6% and
= 22.6% for TC with T2 (Table 4).
|
Grain moisture:
In Experiment 1, 12 and 13 QTL influencing GM in TC with tester T1 and T2, respectively, were detected. A simultaneous fit yielded R2 = 45.6% and
= 58.2% for TC with T1 and R2 = 55.1% and
= 61.3% for TC with T2. About half of the detected QTL displayed significant (P < 0.05) QTL x E interactions. Seven QTL were in common for both testers with similar
-effects.
In Experiment 2, three QTL were found for GM in TC with T1 (R2 = 17.4% and
= 17.6%) and nine in TC with T2 (R2 = 57.9% and
= 63.4%). One of the QTL was in common between testers, and none displayed significant QTL x E interactions.
Two and six QTL were in common between Experiment 1 and 2 for tester T1 and T2, respectively (Table 5). Estimates of
-effects from validation in Experiment 2 were in four cases larger but otherwise much smaller than those from calibration in Experiment 1 (Table 3). If significant, both estimates of
had identical sign except for one QTL on chromosome 9 for tester T1 and one on chromosome 2 for T2. Collectively, the QTL effects from validation in Experiment 2 accounted for R2 = 24.3% and
= 15.5% for TC with T1 and R2 = 46.4% and
= 44.8% for TC with T2. Estimates of
-effects from calibration in Experiment 2 generally agreed well with those from validation in Experiment 1 (Table 4), where a simultaneous fit explained R2 = 6.2% and
= 10.1% for TC with T1 and R2 = 31.3% and
= 34.7% for TC with T2.
Kernel weight:
In Experiment 1, 12 QTL in TC with T1 and 11 QTL in TC with T2 were found for KW. The 12 QTL accounted for R2 = 63.7% and
= 71.2% for TC with T1 and R2 = 52.5% and
= 58.9% for TC with T2. About one quarter of the QTL showed significant (P < 0.05) QTL x E interactions. Ten QTL were in common and had similar
-effects for both testers.
In Experiment 2, four QTL were detected for TC with T1 and five QTL for TC with T2. Collectively, these QTL explained R2 = 41.5% and
= 47.5% for TC with T1 and R2 = 43.7% and
= 49.6% for TC with T2. Three QTL were in common between testers and none showed significant QTL x E interactions.
Three QTL for T1 and two QTL for T2 were in common between Experiment 1 and 2 (Table 5), including the largest QTL explaining about 25.5% of
2p in both experiments (Table 3 and Table 4). Estimates of
-effects from validation in Experiment 2 were on average almost as large as those from calibration in Experiment 1 and agreed in sign except for one QTL on chromosome 2 (Table 3). Collectively, the QTL from validation in Experiment 2 explained R2 = 47.7% and
= 47.3% for TC with T1 and R2 = 39.8% and
= 43.1% for TC with T2. Likewise,
-effects from calibration in Experiment 2 were mostly in close agreement with those obtained from validation in Experiment 1 (Table 4). Collectively, the QTL from validation in Experiment 1 explained R2 = 35.9% and
= 40.2% for TC with T1 and R2 = 33.0% and
= 37.1% for TC with T2.
Protein concentration:
In Experiment 1, nine and ten QTL influencing PC in TC with T1 and T2, respectively, were mapped. A simultaneous fit yielded R2 = 37.7% and
= 48.8% for TC with T1 and R2 = 43.0% and
= 52.3% for TC with T2. Altogether, five QTL showed significant (P < 0.05) QTL x E interactions. Seven QTL were in common for both testers with
-effects of similar size and same sign.
In Experiment 2, three QTL affected PC in TC with T1 (R2 = 31.6% and
= 46.6%) and four QTL with T2 (R2 = 34.8% and
= 44.4%). Two QTL were in common between both testers. None of the QTL displayed significant QTL x E interactions.
Only one QTL was in common between Experiment 1 and 2 for each tester (Table 5). In several instances,
-effects from calibration in Experiment 1 differed in sign (six QTL) or deviated in magnitude from those estimated from validation in Experiment 2 (Table 3). In the latter analysis, we obtained R2 = 19.5% and
= 18.0% for TC with T1 and R2 = 25.7% and
= 27.7% for TC with T2. Likewise,
-effects from validation in Experiment 1 were consistently smaller than those obtained from calibration in Experiment 2 and resulted in reduced estimates of R2 = 15.3% and
= 19.5% for TC with T1 and R2 = 9.3% and
= 9.6% for TC with T2 (Table 4).
Plant height:
In Experiment 1, 17 and 14 QTL affecting PH in TC with T1 and T2, respectively, were identified on all 10 chromosomes (Table 3). A simultaneous fit with all QTL accounted for R2 = 63.2% and
= 68.2% in TC with T1 and R2 = 63.6% and
= 73.6% in TC with T2. Ten QTL were in common with similar
-effects for both testers. Nine QTL displayed significant (P < 0.05) QTL x E interactions.
In Experiment 2, four QTL were found in TC with T1. Collectively, these QTL explained R2 = 43.6% and
= 51.9%. Two QTL were found in TC with T2 with R2 = 26.5% and
= 28.7% in a simultaneous fit. The largest QTL on chromosome 1 explaining more than 23% of
2p was in common between both testers.
Three QTL for TC with T1 and one QTL for TC with T2 were in common between Experiment 1 and 2, including the largest QTL found for both testers on chromosome 1 (Table 5). Estimates of
-effects from validation in Experiment 2 were largely consistent in sign and magnitude with those from calibration in Experiment 1 (Table 3). Collectively, the former explained R2 = 55.7% and
= 57.6% for TC with T1 and R2 = 37.5% and
= 36.6% for TC with T2. Validation in Experiment 1 based on calibration in Experiment 2 yielded reduced
-effects with R2 = 16.1% and
= 16.8% for TC with T1 and R2 = 20.8% and
= 22.1% for TC with T2.
Digenic epistasis between detected QTL:
In Experiment 1, the test for digenic epistatic interactions (
-effects) among detected QTL was significant (P < 0.05) in few instances. In TC with T1, we found epistasis only for GM between the QTL on chromosome 2 (position 180 cM) and chromosome 8 (position 106 cM) with
= -2.0 g kg-1 and the QTL on chromosome 7 (position 2 cM) and chromosome 8 (position 52 cM) with
= -1.7 g kg-1. In TC with T2, epistasis was indicated for GM between the QTL on chromosome 4 (position 128 cM) and chromosome 8 (position 114 cM) with
= -3.9 g kg-1 and for KW between two linked QTL on chromosome 2 (position 122 cM and position 156 cM) with
= -2.30 mg and between the QTL on chromosome 5 (position 56 cM) and chromosome 8 (position 48 cM) with
= 2.35 mg. Including the
-effects for these pairs of QTL in the model for the simultaneous fit increased the R2 values only by 23% compared to the model without epistasis. None of the epistatic interactions were confirmed by validation in Experiment 2 and the R2 values for the epistatic model decreased usually by 12% in comparison to the model without epistasis. In Experiment 2, we found no significant (P < 0.05) digenic epistasis between any of the detected QTL.
Correlation between predicted and observed TC performance:
In Experiment 1, estimates of the genotypic correlation rg (Yjz', Mjz) exceeded 0.70 for KW and PH, and ranged between 0.60 and 0.53 for GM and PC, but were below 0.39 for GY (Table 6). In Experiment 2,
g (Yjz', Mjz) was below 0.50 in most cases, except for KW where it ranged from 0.62 to 0.69.
|
| DISCUSSION |
|---|
Advantages of CIM:
A comparison of our results in Experiment 1 for PC and KW with those of ![]()
![]()
![]()
![]()
Estimation of QTL effects from independent samples:
Estimates of individual QTL effects were in most cases considerably smaller when estimated from an independent validation experiment in lieu of the calibration experiment (Table 3 and Table 4). In some cases effects of opposite sign were found in the validation experiment, suggesting the occurrence of a type III error (i.e., a significant association is correctly declared but the marker allele is associated with the wrong QTL allele; ![]()
from the simultaneous fit with all detected QTL was considerably smaller for validation than for calibration (Figure 2). Averaged over all traits and both testers,
dropped from 57.5% for calibration in Experiment 1 to 30.3% for validation in Experiment 2, and from 39.4% for calibration in Experiment 2 to 21.3% for validation in Experiment 1. The decrease in
was particularly pronounced for GY, probably due to its complex genetic architecture.
In our opinion, the decrease in
can mainly be attributed to two factors: (i) the effect of different samples and (ii) the effect of different environments. Both factors are confounded and with currently available statistical models it is not possible to separate them. However, the majority of the detected QTL showed no significant QTL x E interactions, suggesting that different test environments for the two experiments were not the major cause for identification of different QTL. Additionally, the environments used in Experiments 1 and 2 were assumed a random sample of environments available for testing performance of maize in Germany. This assumption was corroborated by the similar magnitude of phenotypic correlations for performance of the 10 check varieties when calculated for environments within Experiments 1 and 2 as compared to correlations between environments of Experiments 1 and 2. Furthermore, when practicing MAS, gain from selection will usually be assessed in environments (years) different from those in which calibration was performed.
On the other hand, computer simulations (![]()
![]()
![]()
![]()
2 value of a QTL explaining 8% of
2g for a trait with h2 = 0.4 can be overestimated up to 388% for N = 100 and up to 44% for N = 300. With experimental data the bias in QTL effects is ex- pected to be even greater than in computer simulations given the uncertainties in the selection of cofactors and the obscuring effects of missing marker data and QTL x E interactions.
The inflation in the R2 and
values of QTL estimated directly from calibration can be attributed to several reasons. All are related to the fact that QTL mapping can be considered as a problem of model selection in multiple linear regression (![]()
![]()
![]()
2 value of a putative QTL in regression according to Equation 1 can be linked with its LOD score and the sample size N (see APPENDIX A)
![]() |
(2) |
Therefore, a QTL search based on the LOD score criterion is according to Equation 2 equivalent to selecting for those regressor variables, which account for the largest proportion (
2) of the variance in the response variable (
2p ) and consequently, faces the same problems of model selection as does multiple linear regression. As is well-known from the statistical literature (see e.g., ![]()
![]()
2 values of the selected explanatory variables, the bias being very severe when the number of observations is small and close to the number of predictor variables. Furthermore, the presence of closely linked markers can introduce multicolinearity among the regressors with negative impacts on the quality and stability of the selected model. In particular, it increases the variance of the estimated regression coefficients and can also strongly affect their magnitude (![]()
Suggestions in the statistical literature (for review, see ![]()
![]()
![]()
![]()
While the absolute proportion of
2g explained by QTL in validation differed substantially, depending upon whether Experiment 1 or 2 was used for calibration, the relative decrease in
from calibration to validation was largely independent of the sample size used for calibration. This could be attributable to the fact that with a larger sample size additional QTL with smaller effects are detected in calibration. Estimates of the effects of these QTL are very likely subject to considerable sampling bias and, therefore, contribute substantially to the inflation in
values estimated from calibration even with large sample sizes.
The lack of consistency between QTL effect estimates obtained from calibration and validation has several important consequences for QTL mapping and MAS for polygenic traits: (1) It demonstrates that, due to sampling and QTL x E interactions, individual QTL effects estimated directly from calibration can be inflated, especially for smaller values of N and complexly inherited traits such as GY. Inferences about the relative magnitude of QTL effects estimated from previous experimental studies should be reexamined under this aspect. (2) The distribution of estimated QTL effects may not reflect the distribution of true QTL effects. A large estimate may reflect either a large QTL or a small QTL estimated with a large bias. (3) The decision of which QTL regions to transfer with MAS and/or to consider in a selection index should be based on QTL effects verified in an independent validation sample. (4) For a correct assessment of the prospects of MAS, the key parameter p must not be determined from calibration, but from an independent validation sample or by using cross-validation.
Comparison of QTL detected in samples of different size:
We evaluated the power of QTL detection by comparing results from QTL mapping in two independent samples of different size from the same population. The smaller sample size (N = 107) in Experiment 2 was chosen in accordance with (1) most experimental QTL studies reported in the literature and (2) the maximum number of progenies generally employed per cross for early testing in recycling breeding (![]()
![]()
![]()
2p in Experiment 2 (N = 107), but as little as 3.3% of
2p in Experiment 1 (N = 344). (Smaller values found in Table 3 and Table 4 are due to the fact that these estimates refer to partial
2 values from a simultaneous fit of all detected QTL, which can deviate from the
2 values calculated in multiple regression according to Equation 1 due to confounding effects of undetected minor QTL linked in repulsion phase). As a consequence, the total number of QTL detected for all traits and both testers in Experiment 1 was almost triple the number detected in Experiment 2.
Only about half (20) of the putative QTL detected in Experiment 2 were in common with QTL identified in Experiment 1 and the poorest agreement was observed for GY (Table 5). As pointed out earlier, the comparison of results between Experiments 1 and 2 is confounded by the different test environments and very likely both factors, sampling and QTL x E interactions, contributed to the lack of congruency of QTL found for the two experiments. In a comparison of QTL mapping results from two independent studies with elite cross B73 x Mo17, ![]()
![]()
![]()
![]()
2p in Experiment 1 is only about 0.5. Consequently, if such a QTL is detected in one experiment, it has only an even chance of being identified in an independent set of progeny. This argument applies to (1) small QTL and large values of N or (2) large QTL and small values of N, but it does not suffice to explain why half of the large QTL detected in Experiment 2 were not recovered in Experiment 1. This is because with N = 344, h2 > 0.5, and LOD > 2.5, the power for detecting a QTL which supposedly accounts for 10% or more of
2p , exceeds 0.90 (H. F. UTZ, unpublished results).
This apparent gap of explanation can be closed by considering that many of the QTL effects estimated in Experiment 2 had a large upward bias, as discussed earlier. Assuming their true effects were often much smaller, it follows in combination with the previous argument that there was only a moderate chance of detect-ing them simultaneously in Experiment 1. In addition, we cannot rule out that a few of the putative QTL were either environment-specific or "false positives" and therefore occurred only in one experiment, given that a LOD threshold of 2.5 corresponds in our study to a genomewise Type I error rate Pg
0.25.
For testing congruency of QTL it was not possible to adopt a criterion based on overlapping confidence intervals, because with CIM their computation is still an unsolved problem (![]()
The "genetic architecture" of a trait characterized by the number of effective factors (![]()
is expected to be smaller with a small number of major QTL explaining a substantial proportion of
2g than for a trait with a large number of minor QTL. A comparison of our results for GY and the other traits supports this hypothesis. It was interesting to observe, however, that the highest number of QTL was found for PH, a trait with presumably oligogenic inheritance. Before the advent of molecular markers, estimates on the number of genes involved in expression of quantitative traits were mainly based on WRIGHT's (1968) formula, which severely underestimates the number of effective factors involved in trait expression if certain assumptions such as purely additive gene action and independent segregation of genes with equal effects are not met. Based on results from QTL mapping studies, it seems very likely that even highly heritable traits like PH are regulated by a large number of genes and that assumptions about the inheritance of these traits need to be revised. Similar findings have recently been reported by ![]()
The lack of congruency among QTL detected in two samples from the same cross provides the baseline for comparisons of QTL detected in populations derived from different crosses. Therefore, it was not surprising that most QTL regions reported here were either unique or found in just one or two comparable studies in the literature. Only one QTL region adjacent to marker umc89 on chromosome 8 affecting both GY and GM was identified in several other investigations (![]()
![]()
![]()
![]()
![]()
![]()
In addition to the confounding factor of sampling, two features of our experimental materials could explain the singular set of QTL reported here: (1) Our mapping population was generated from a cross of two elite European flint lines, whereas all other QTL studies in maize have employed wide crosses between North American dent lines or tropical germplasm. Because flint and dent are fairly distinct germplasm groups, we hypothesize that they have only a small subset of polymorphic QTL in common. (2) In practical breeding programs, elite lines from the same heterotic group are crossed and early selfing generations (F2 plants or F3 lines) are evaluated for their TC performance in combination with testers from the opposite heterotic group. We therefore mapped QTL for TC performance as opposed to line per se performance commonly determined in most previous QTL studies. This can result in largely different sets of QTL as demonstrated in a comparison of both features in cross B73 x Mo17 (![]()
Comparison among testers:
According to theory (see APPENDIX B), consistent QTL mapping results across testers are expected in the absence of epistasis if both testers have identical alleles at the QTL, or additive gene action prevails, or the dominance effects satisfy the condition
![]() |
(3) |
In all these cases, interactions in the two-way table of TC means with parents representing one factor and testers the second factor are absent and rg between different TC series is 1.0. Conversely, inconsistent results can arise when (a) the two testers have different alleles at the QTL (T1
T2) and Equation 3 is not satisfied (i.e., the alleles in P1 and P2 show different dominance relationships with each tester allele) or (b) the QTL alleles of P1 and P2 display different epistatic interactions with the tester alleles at other loci. Obviously, both cases also result in lower estimates for rg.
With the exception of GY, our QTL mapping results in Experiment 1 agreed well across testers for all traits: more than half of the QTL detected with one tester were also found with the other tester and the proportion of common QTL was in close agreement with the magnitude of
g. This is consistent with the preponderance of additive gene action found for these traits in classic quantitative-genetic experiments (![]()
![]()
![]()
![]()
The absence of common QTL between both testers observed for GY can be explained by several causes, the most important being related to gene action. Studies on GY exhibited a high degree of dominance (![]()
![]()
![]()
![]()
![]()
Given that the tester may change over time in a hybrid breeding program, we examined whether a marker index score Mjz based on QTL mapping results with tester Tz' would be effective for improving TC performance Yjz' with tester Tz'. For this purpose, we estimated the genotypic correlation rg (Yjz', Mjz), which represents the key parameter in the formula for the selection response in Yj from indirect selection for Mj (![]()
g (Yjz', Mjz) was relatively small in most cases. In combination with the low proportion of
2g explained by the detected QTL in this experiment, these results corroborate that for complex polygenic traits a sample size N
100 is not sufficient to obtain reliable QTL estimates for MAS.
Epistasis among QTL:
The comparison of TC generation means for parents (
) and F3 lines (
3 ) provides a test for the net effect of epistasis across the entire genome (![]()
![]()
![]()
![]()
![]()
One reason for the absence of significant epistasis in our study could be that we investigated a genetically narrow cross between elite lines from the same germplasm group. In this case, there should be less opportunity to disrupt coadapted epistatic gene complexes in the parents as might be expected for wide or interspecific crosses oftentimes employed in QTL mapping studies. Furthermore, the power for detecting epistatic interactions among QTL is lower for TC performance than line per se performance due to masking effects of the tester (![]()
In our analysis, those QTL with significant epistatic but insignificant main effects would remain undetected. A recent QTL study on grain yield components in rice identified a large number of QTL regions of this type (![]()
QTL x environment interactions:
In Experiment 1, about one third of the detected QTL displayed significant QTL x E interactions. The smallest fraction (one out of nine) was observed for GY, although estimates of
2ge for this trait were highly significant and of the same magnitude as
2g (Table 1). For the other traits, the proportion of QTL with significant QTL x E interactions was approximately proportional to the ratio
2ge :
2g and by far greatest for GM. Interestingly, reducing the number of test environments for PH in Experiment 1, from nine to four neither altered the number of detected QTL nor reduced the number of significant QTL x E interactions (data not shown). In Experiment 2, the ratio
2ge :
2g was generally smaller than in Experiment 1, which was reflected in fewer QTL showing significant QTL x E interactions.
Most QTL studies reported in the literature (e.g., ![]()
![]()
![]()
![]()
-effects. However, the second hypothesis cannot be ruled out with the statistical analysis followed here because our search for QTL started with an analysis of means across environments, which favors the detection of QTL with large main effects over those with small main effects and large QTL x E interactions. Only after all putative QTL had been mapped, we applied a combined analysis across environments for testing the presence of QTL x E interactions. In contrast, the new method of multi-trait analysis devised by ![]()
In general, varying the statistical analysis for CIM had only little impact on our findings. A comparison of our method used for QTL and QTL x E analysis with that of ![]()
Conclusions:
Identification of QTL affecting TC performance of agronomically important traits and accurate estimation of their genetic effects, including epistasis and QTL x E interactions, are essential requirements for application of MAS in hybrid breeding of maize. Here, we used independent samples of TC progenies from the same population to (1) assess the magnitude of the bias of estimated QTL effects and (2) compare the power of QTL detection in samples of different size.
Our results suggest that inferences drawn from QTL mapping studies about the efficiency of MAS should be verified in an independent validation sample. When QTL effects are estimated from the same data as used for detection and mapping of QTL positions, they can be inflated due to statistical sampling and G x E interactions. The relative magnitude of the bias can be substantial for sample sizes typically used in QTL mapping experiments (N < 200) especially for traits with moderate heritability and a complex genetic architecture such as grain yield. As a consequence, the key factor determining the efficiency of MAS in comparison with classical phenotypic selection, the proportion, p, of the genotypic variance explained by QTL-marker associations, is overestimated. Moreover, if in this study the magnitude of estimated QTL effects had been used as a criterion for the choice of important QTL regions to be transferred by MAS or to be considered in a selection index, selection response would have been smaller than expected, because QTL effects estimated from calibration were biased.
With currently available statistical methods it was not possible to separate the effects of statistical sampling and QTL x E interactions in this study, but we believe that at least the bias owing to sampling effects can be reduced by validation or cross-validation. For a correct assessment of the prospects of MAS as compared to classical phenotypic selection, more research efforts need to be dedicated to the analysis of the different factors leading to the inflation of QTL effect estimates.
The moderate agreement among the QTL detected in each sample provides evidence for a low power of QTL detection for most traits, especially GY. Only a small fraction of the detected QTL showed significant QTL x E interactions for all traits except GM, suggesting that field testing of experimental materials could be limited to few environments known to provide good differentiation.
The consistency of QT



) for maize TC progenies of F3 lines with testers T1 and T2 in Experiments 1 and 2
