- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Schön, C. C.
- Articles by Melchinger, A. E.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Schön, C. C.
- Articles by Melchinger, A. E.
Quantitative Trait Locus Mapping Based on Resampling in a Vast Maize Testcross Experiment and Its Relevance to Quantitative Genetics for Complex Traits
Chris C. Schöna, H. Friedrich Utzb, Susanne Grohc, Bernd Trubergd, Steve Openshawe, and Albrecht E. Melchingerba State Plant Breeding Institute, Seed Science and Population Genetics, University of Hohenheim, 70593 Stuttgart, Germany,
b Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, 70593 Stuttgart, Germany,
c Pioneer Génétique, 68740 Nambsheim, France,
d Pioneer Hi-Bred Northern Europe GmbH, 48268 Greven, Germany
e Syngenta Seeds, Stanton, Minnesota 55018
Corresponding author: Albrecht E. Melchinger, Seed Science and Population Genetics, University of Hohenheim, 70593 Stuttgart, Germany., melchinger{at}uni-hohenheim.de (E-mail)
Communicating editor: R. W. DOERGE
| ABSTRACT |
|---|
From simulation studies it is known that the allocation of experimental resources has a crucial effect on power of QTL detection as well as on accuracy and precision of QTL estimates. In this study, we used a very large experimental data set composed of 976 F5 maize testcross progenies evaluated in 19 environments and cross-validation to assess the effect of sample size (N), number of test environments (E), and significance threshold on the number of detected QTL, the proportion of the genotypic variance explained by them, and the corresponding bias of estimates for grain yield, grain moisture, and plant height. In addition, we used computer simulations to compare the usefulness of two cross-validation schemes for obtaining unbiased estimates of QTL effects. The maximum, validated genotypic variance explained by QTL in this study was 52.3% for grain moisture despite the large number of detected QTL, thus confirming the infinitesimal model of quantitative genetics. In both simulated and experimental data, the effect of sample size on power of QTL detection as well as on accuracy and precision of QTL estimates was large. The number of detected QTL and the proportion of genotypic variance explained by QTL generally increased more with increasing N than with increasing E. The average bias of QTL estimates and its range were reduced by increasing N and E. Cross-validation performed well with respect to yielding asymptotically unbiased estimates of the genotypic variance explained by QTL. On the basis of our findings, recommendations for planning of QTL mapping experiments and allocation of experimental resources are given.
DURING the past 15 years a large number of studies have identified molecular markers linked to quantitative trait loci (QTL) involved in the inheritance of agronomically important traits. These QTL generally explained a significant proportion of the phenotypic variance of the respective trait and, therefore, gave rise to an optimistic assessment of the prospects of marker-assisted selection (MAS; for review see ![]()
![]()
![]()
![]()
![]()
An explanation for the latter results could be found in theoretical studies (![]()
![]()
![]()
![]()
![]()
The effect of experimental dimensions such as sample size and number of test environments on the power of QTL detection as well as accuracy and precision of QTL estimates has been investigated in simulation studies, generally with the assumption of few (
10) segregating QTL. On the basis of simulations with 40 segregating QTL, ![]()
![]()
![]()
![]()
| MATERIALS AND METHODS |
|---|
Plant materials:
Two elite dent inbred lines, subsequently referred to as P1 and P2, were used as parents. They belonged to the same heterotic pool but were known to be genetically diverse with a coefficient of coancestry (![]()
Field experiments:
The testcross progenies were evaluated in 1994 and 1995 in 7 and 12 locations, respectively. The experiments were located in Illinois (3 locations), Indiana (2), Iowa (3), Kansas (1), Nebraska (2), and Italy (1). In each of the 19 environments the experimental design consisted of 18 blocks with 60 entries. Each block contained testcrosses of 55 F5 lines, P1, P2, their F1, and two checks. Trials were performed with one replication per environment. Two-row plots (8.2 m2) were machine planted (5.57.0 plants m2) and harvested as grain trials with a combine.
Data were recorded for grain yield in megagrams per hectare, adjusted to 155 g kg1 grain moisture, and grain moisture in grams per kilogram at harvest. Plant height was measured in centimeters on a plot basis as the distance from the soil level to the uppermost leaf in 16 of the 19 environments.
RFLP marker genotyping and linkage map construction:
DNA extraction, restriction enzyme digestions, gel electrophoresis, transfer of DNA to nylon membranes, and DNA hybridizations were performed by standard procedures (![]()
2 test. Appropriate type I error rates were determined by the sequentially rejective Bonferroni test (![]()
![]()
Agronomic data analyses:
All quantitative genetic parameters were estimated on the basis of the 976 testcross progenies of F5 lines for which high-quality molecular data were available. Each site-year combination was treated as an environment in the analysis. Trait values were adjusted for block effects. For each environment, block effects were calculated as the deviation of the 55 F5 testcrosses in that block from the mean of all F5 testcrosses. An analysis of variance (ANOVA) combined across environments was computed. Components of variance were estimated considering all effects in the statistical model as random. Estimates of variance components
2ge [genotype-by-environment (G x E) interaction variance confounded with experimental error] and
2g (genotypic variance) of testcross progenies of F5 lines and their standard errors (SE) were calculated as described by ![]()
![]() |
(1) |
where E is the number of environments. Exact 95% confidence intervals (C.I.) of
2 were calculated according to ![]()
QTL analyses:
QTL mapping and estimation of their effects were performed with software PLABQTL (![]()
![]()
![]()
![]()
![]()
![]()
- Cofactors were chosen with an "F-to-enter" and an "F-to-delete" value of 12.4 and testing for presence of a putative QTL in an interval by the likelihood-ratio test was performed using a LOD threshold of 3.21. The experimentwise type I error was determined to be Pe < 0.02, using 1000 permutation runs (
DOERGE and CHURCHILL 1996 ).
- F-to-enter and F-to-delete values were set to 3.5 and the LOD threshold to 2.5. The latter combination corresponds to an experimentwise type I error of Pe < 0.35. Estimates of QTL positions were obtained at the position, where the LOD score assumed its maximum in the region under consideration.
The proportion of the phenotypic variance explained by QTL was determined by the estimator R2adj as described by ![]()
![]() |
(2) |
Both parameters, R2adj and h2, are estimated with an experimental error and estimates of p can exceed 100% or become negative. We did not restrict estimates of p to the parameter space [0, 100], because additional bias is introduced if estimates are constrained to lie within theoretical boundaries (![]()
Subdivision and analysis of experimental data:
From the experimental reference population PED (N = 976, E = 19; grain yield and grain moisture) or PED (N = 976, E = 16; plant height) an array of (a) genotypic subpopulations of size N (N = 976, 488, 244, 122) and (b) environmental subpopulations of size E (E = 19 or 16, 4, 2) was sampled without replacement. After randomization of the genotypes and environments in PED (N, E), this procedure was repeated 260 times to result in a total of 120 or 128 different data sets (DS) per PED (N, E) except for PED (976, 19/16), where only 1 DS exists (Table 1).
|
Within each PED (N, E) estimation of quantitative genetic parameters such as variance components and heritabilities as well as QTL analyses were performed for each DS individually at two levels of significance as described earlier. Fivefold cross-validation (fivefold CV; ![]()
![]()
The following parameters were estimated:
- The heritability of each trait for each DS (
2DS) and averaged over all DS (
2DS) for a given PED (N, E). - The number of QTL (
DS) and the proportion of the genotypic variance explained by QTL (
DS) in each DS as well as their arithmetic mean over all DS (
DS and
DS) for a given PED (N, E). - The number of QTL (
ES) and the proportion of the genotypic variance explained by QTL in each ES (
ES), their arithmetic mean over all ES for a given DS (
ES and
ES), and their grand arithmetic mean over all DS (
ES and
ES) for a given PED (N, E). - The proportion of the genotypic variance explained by QTL in each TS (
TS), the arithmetic mean over all TS for a given DS (
TS), the grand arithmetic mean over all DS (
TS) for a given PED (N, E), and the median (
TS) and 12.5% [
TS (12.5%)] and 87.5% [
TS (87.5%)] quantiles of the proportion of the genotypic variance for all DS of a given PED (N, E). - The magnitude of the bias in estimates of the genotypic variance explained by QTL due to genotypic sampling calculated as the difference between average estimates of p obtained from ES and corresponding TS (
ES
TS), averaged over all ES and TS for a given DS (
ES
TS), the grand arithmetic mean over all DS (
ES
TS) for a given PED (N, E), and the median and 12.5 and 87.5% quantiles of the bias for all DS of a given PED (N, E).
For DS with
2DS < 0.002 the parameters
ES,
TS, and the bias were set to zero due to the problem of dividing by a number close to zero.
Simulation data set:
In the QTL analysis of the experimental data PED (976, 16) with LOD 3.21, 21 QTL for plant height were detected, explaining an estimated 56.1% of the genotypic variance. On the basis of the linkage map with 172 RFLP markers and assuming the estimated positions and effects of these 21 QTL to be the true QTL parameters, a simulated reference population PSD (N = 4880, E = 16) consisting of 4880 F4 individuals was generated using software PLABSIM (![]()
Subdivision and analysis of simulated data:
The simulated reference population PSD (4880, 16) was partitioned into 10 or 40 genotypic subpopulations of size N = 488 or 122, respectively. Environmental subpopulations of size E = 16, 4, and 2 were also generated. After randomization of the genotypes and environments in PSD (N, E), this partitioning was conducted 116 times to result in a total of 160 DS per PSD (N, E) except for PSD (122, 2) with 320 possible DS (Table 1).
In general, estimation of quantitative genetic parameters such as variance components and heritabilities as well as QTL mapping and QTL parameter estimation (mDS, mES, pDS, pES, and pTS) were conducted as described for the experimental subpopulations, but with simulated data only one threshold for declaring significant QTL (LOD = 2.5 and F-to-enter = 3.5) was used. Two CV schemes, standard CV and a second CV analysis accounting for genotypic and environmental sampling simultaneously (CV/GE) as described by ![]()
For each DS QTL positions and effects estimated with composite interval mapping were used to predict the genotypic value of each of the 4880 F4 individuals from the simulated reference population on the basis of its marker genotype (
DS). Pearson's correlation coefficient r (![]()
. Subsequently, the arithmetic mean over all DS (
SD:DS) was calculated.
Analogously, for each of the two CV schemes QTL positions and effects estimated in ES were used to predict the genotypic value of each of the 4880 F4 individuals from the simulated reference population on the basis of its marker genotype (
ES). Likewise, the proportion of the genotypic variance attributable to simulated QTL was estimated as
. Subsequently, the arithmetic mean over all ES for a given DS and the grand arithmetic mean over all DS (
SD:ES) for a given PSD (N, E) were calculated.
In CV with experimental data, the magnitude of the bias of QTL estimates is determined from the difference
ES
TS. However, in fivefold CV, estimation of QTL is based on 20% fewer individuals than in DS and, therefore, power of QTL detection is reduced, affecting the estimation of the number of QTL, p, and the bias. For the simulated data, the QTL genotypes are (partly) known and an estimate of the truly accountable proportion of the genotypic variance (
SD:DS and
SD:ES) can be obtained. Consequently, an estimate of the bias of QTL effects due to model selection in DS can be calculated from the difference
DS
SD:DS. The unbiasedness of
TS can be derived from the congruency between
TS and
SD:ES. To be useful for assessing the prospects of MAS,
TS should be of similar size as
SD:DS despite the 20% fewer individuals in ES as compared to DS.
Consistency of QTL estimates across subpopulations:
QTL consistency across subpopulations was assessed. All QTL detected for plant height in the experimental reference population PED (976, 16) with LOD 3.21 were assumed as reference QTL. Around their position on the genome, 20-cM intervals (10 cM downstream and upstream) were constructed. Subsequently, QTL mapping results of all DS within a given PSD (N, E) and PED (N, E) were scanned with LOD 3.21 and the number of QTL positioned within one of the 21 intervals and of the same sign as the reference QTL (matching) was counted. The number of newly occurring QTL (not matching) was also assessed. The same analysis for determining matching and nonmatching QTL was performed for all DS of a given PED (N, E) on the basis of LOD 2.5 with an extended set of 30 QTL detected in the reference population PED (976, 16) with a LOD threshold of 2.5.
| RESULTS |
|---|
Analysis of the experimental population PED (976, 19/16):
Molecular data:
Three chromosomal regions on chromosomes 2, 5, and 8 showed allele and genotype frequencies deviating highly significantly from Mendelian expectations (P < 0.0001). The 172 RFLP marker loci spanned a map distance of 1818 cM with an average interval length of 11.2 cM. One hundred percent of the genome was located within a 20-cM distance to the nearest marker.
Trait means, variances, and heritabilities:
Climatic conditions were favorable for maize production in all environments. Phenotypic correlations between environments based on performance of the testcrosses of the 976 F5 progenies varied between 0.03 and 0.24 for grain yield, between 0.09 and 0.65 for grain moisture, and between 0.22 and 0.44 for plant height. For all three traits, the nonsignificant orthogonal contrast between the average testcross performance of the two parental lines and the testcross mean of the F5 lines indicated the absence of epistasis (Table 2), supporting an additive model for QTL analyses. The range in testcross performance of F5 lines considerably transgressed the testcross means of the parents and
2g was significantly greater than zero (P < 0.01). Heritability on a progeny-mean basis was high (
2
0.89) for grain moisture and plant height but medium for grain yield (
2 = 0.64). Heritability on a plot basis was 0.38 and 0.33 for grain moisture and plant height, respectively, and 0.085 for grain yield.
|
QTL analyses:
Results from QTL analyses in the experimental population PED (976, 19/16) are presented in Table 3 for both significance thresholds. Increasing the LOD and F-to-enter threshold (decreasing the type I error rate) decreased the power of QTL detection, as reflected by the number of detected QTL (
DS) and the proportion of genotypic variance explained by them in the DS (
DS) and averaged over ES (
ES) for all traits. Averages over TS (
TS) were between 3.1% (grain yield) and 7.7% (plant height) higher for LOD 2.5 than for LOD 3.21. Fivefold standard CV revealed no major difference in absolute bias (
ES
TS) between the two thresholds of QTL detection for grain moisture and plant height; however, a slightly increased bias was observed for grain yield and LOD 2.5 as compared to LOD 3.21.
|
Analysis of experimental subpopulations:
Heritability estimates on an entry-mean basis were calculated for each DS within a given PED (N, E). The average estimates (
2DS) ranged from 17.3% [PED (976, 2)] to 63.8% [PED (976, 19)] for grain yield, from 54.0% [PED (488, 2)] to 92.0% [PED (976, 19)] for grain moisture, and from 48.4% [PED (244, 2)] to 88.8% [PED (976, 16)] for plant height. For all traits,
2DS showed a wide range across DS especially for small N and E (data not shown). For grain yield, DS with
2DS < 0.002 were observed for PED (244, 4), PED (122, 4), and all PED (N, 2).
Power of QTL detection was affected by the trait under study, the significance threshold, the sample size, and the number of environments (Table 4). The average number of QTL in ES (
ES) was highest for grain moisture in PED (976, 19) with LOD 2.5 (28.3) and lowest for PED (122, 2) and grain yield with LOD 3.21 (0.1). With increasing N and E,
ES increased for all traits and both significance levels, except for grain yield with more QTL detected for PED (122, 2, and 4) as compared to PED (244, 2, and 4) with LOD 2.5. In small samples only few QTL were detected for LOD 3.21; on average
ES
1.0 with N = 122 for all traits and numbers of test environments. With LOD 3.21, only in few DS were >10 QTL detected even for large samples. For N = 122 most DS yielded ES with no detected QTL (
).
|
The average proportion of the genotypic variance explained by the detected QTL in TS (
TS) generally increased with increasing N for all traits and both significance thresholds (Fig 1) and was always greater for LOD 2.5 than for LOD 3.21. The number of test environments had only a small effect on both
TS and the bias (
ES
TS). A dramatic increase in bias could be observed for small N, LOD 2.5, and grain yield. As a consequence,
ES was greatest for small N and grain yield with an estimated average bias close to 100% for PED (122, 2). For grain moisture and plant height and LOD 2.5,
ES was almost constant for all PED (N, E). For LOD 3.21, average bias was similar for most PED (N, E) in all traits. Deviating from expectations, in some cases the average bias even increased from smaller to larger subpopulations. This can be attributed to the fact that (i) for small subpopulations, more ES with 0 detected QTL occurred and (ii) for E = 2, samples with no significant genotypic variance (
) were observed, in which case
ES and
TS were zero, resulting in zero bias.
|
The mean (
TS), median (
TS), and 12.5 and 87.5% quantiles [
TS (12.5%),
TS (87.5%)] of the proportion of the genotypic variance explained in TS is shown for each combination of PED (N, E) and LOD 2.5 in Fig 2. Results with LOD 3.21 were similar and are therefore not shown. For grain moisture and plant height, variation of
TS among DS was increased for small N. For grain yield and E = 4 and E = 2, however, N had no clear effect on the range of
TS. Increasing the number of test environments generally decreased the range of
TS. The mean (
TS) and the median (
TS) proportions of genotypic variance explained by QTL were in good agreement except for grain yield and E = 2 and 4.
|
A similar picture was observed when analyzing the variation in the bias (
ES
TS) among DS for a given PED (N, E) (Fig 3). For small N and E, the variation of bias was large for grain yield and LOD 2.5. Estimates of the bias of over 100% occurred in some DS with a largely overestimated proportion of genotypic variance explained in ES divided by a small heritability estimate. The range of the 12.5 and 87.5% quantiles was considerably reduced for LOD 3.21 as compared to LOD 2.5. As shown for
TS, the mean bias and the median bias across DS differed considerably for grain yield and E = 2. For LOD 2.5, the median bias decreased with increasing N and E for all traits and combinations of PED (N, E) according to expectations. However, with LOD 3.21 the median and the mean bias increased from N = 122 to N = 244 for all three traits due to a high number of ES with 0 detected QTL.
|
Analysis of simulated subpopulations:
Results for plant height in each of the six PSD (N, E) are given in Table 5. Heritability estimates agreed well with those underlying the simulated values. In none of the 160 DS samples (320 for N = 122, E = 2) could all 21 QTL be detected. Power of QTL detection and the proportion of estimated QTL positions lying within a 20-cM interval of the simulated QTL (matching) increased with increasing N and E in DS analyzed with composite interval mapping and in ES analyzed with standard CV and CV/GE. With N = 122 and E = 2, on average only 2.5 simulated QTL (4.7 x 0.53) were detected with composite interval mapping. The proportion of genotypic variance explained by QTL estimated in DS (
DS) was generally greater than the proportion of the genotypic variance attributable to simulated QTL (
SD:DS), revealing considerable bias of QTL estimation (7.637.4%), especially for the small sample size. The number of matching QTL was smaller for the two CV schemes than for composite interval mapping due to the 20% smaller population size and/or reduced number of environments in ES as compared to DS. As a consequence, estimates of the bias increased from composite interval mapping (
DS
SD:DS) to standard CV (
ES
TS) and were greatest for CV/GE and PSD (122, 2). For all combinations of N and E a better agreement of
TS with
SD:ES and
SD:DS was found for standard CV than for CV/GE. For standard CV, a close agreement between
TS and
SD:ES and only a slight decrease in magnitude from
TS to
SD:DS (<5%) for all combinations of N and E indicated that
TS can be used as an unbiased estimate of the genotypic variance explained by QTL with finite population sizes. For CV/GE and N = 122,
TS deviated markedly from
SD:ES and
SD:DS, indicating that more research is needed to examine the small sample properties of
TS from CV/GE.
|
Consistency of QTL estimates across experimental subpopulations:
The 21 intervals of size 20 cM constructed around the QTL detected for plant height in the DS of PED (976, 16) with LOD 3.21 did not overlap. On average, the number of detected QTL matching the reference QTL was increased considerably for DS with large N [PED (976, E)] as compared to the smaller subpopulations [e.g., PED (122, E)] (Table 6). The number of unmatched QTL was increased to a much smaller extent from smaller to larger populations, indicating a higher ratio of true:false QTL for the larger subpopulations. Doubling the number of environments did not improve the ratio of true:false QTL as much as doubling the number of testcross progenies in the subpopulation. Similar results were obtained with 30 intervals of size 20 cM constructed around QTL detected in DS of PED (976, 16) with a LOD threshold of 2.5.
|
| DISCUSSION |
|---|
Influence of sample size and number of test environments:
To our knowledge, phenotypic and molecular data on testcross progenies of 976 F5 lines evaluated in 19 environments is by far the largest QTL mapping experiment ever published in plants. The dimensions of the experiment had been designed to meet assumptions from a simulation study performed by ![]()
60% and consequently explain
60% of the genotypic variance with QTL. With the high-density genetic map used in this study (average interval length 11.2 cM), the power of QTL detection was expected to be even higher compared to the simulation study by ![]()
![]()
DS, indicating that QTL mapping results need to be interpreted with caution and strategies are needed for their validation.
In simulated and experimental data, the effect of sample size on QTL parameter estimation was large. As expected, the number of detected QTL generally increased with increasing sample size. The comparison of subpopulations with the same plot capacities for phenotypic evaluation revealed that increasing the number of progenies generally increased the power of QTL detection (
ES) and the proportion of the genotypic variance explained by QTL (
TS) and reduced the bias more efficiently than did increasing the number of test environments. For grain yield and LOD 2.5, however, the number of detected QTL was higher when doubling plot capacities from PED (122, 2) to PED (122, 4) as compared to PED (244, 2), probably due to the fact that the estimated average heritability for grain yield and E = 2 (
2DS < 0.18) was too low for detecting significant QTL for small N. For grain moisture, the number of test environments had only little effect on
TS for all population sizes, presumably because few QTL showed significant interactions with environments. As pointed out by ![]()
![]()
When increasing the population size from N = 488 to N = 976, the increase in the proportion of genotypic variance explained by QTL (
TS) per additionally tested genotype was always smaller as compared to increasing N from 244 to 488. This diminishing return per additional test unit was expected due to the nonlinear relationship of sample size and power of QTL detection (![]()
![]()
max = 0.49
P). The median genetic effect was small (0.1
P <
< 0.2
P) and the distributions were skewed toward smaller values (L-shaped) for all traits. These findings corroborate the hypothesis that polygenic traits are regulated by a large number of genes with small effects that follow approximately a geometric distribution. Hence, it seems questionable if simulation studies on QTL mapping and MAS assuming 510 QTL with equal effects of up to or >1.0
P are reflecting the true inheritance of polygenic quantitative traits such as grain yield.
|
Depending on the genetic architecture of the trait and its environmental stability, the extra input of resources for explaining a small additional proportion of the genotypic variance by markers can be vast (MAS for QTL with effects of 0.1
P and even more, so their cloning seems an idle undertaking). Therefore, trait-specific strategies for MAS have to be developed. MAS seems promising only if alleles with large effects are segregating for the trait of interest. ![]()
P. In plant breeding experiments, QTL with effects of this size have been reported. However, results from QTL studies indicating the presence of major genes with large effects have to be interpreted with caution due to the problem of model selection. In a large number of published QTL studies with small sample size (100 < N < 200) a considerable proportion of the genotypic variance could be explained by few QTL (![]()
ES was almost constant for grain moisture and plant height for all combinations of PED (N, E) and LOD 2.5 but for small N most of
ES must be attributed to bias and not to the effects of real QTL. These findings were also corroborated by results from simulated data on plant height. When performing a linear regression of the proportion of genotypic variance explained in ES on the number of detected QTL, the correlation (r) between
ES and
ES was relatively high, amounting to r = 0.74 for PSD (488, 4) and r = 0.80 for PSD (122, 4). When the dependent variable was
TS, however, the correlation dropped to r = 0.57 for N = 488 and even more for N = 122 (r = 0.37), indicating that especially for small N quite a few QTL detected in ES were false positives and did not contribute to
TS.
Influence of significance threshold:
![]()
TS for all three traits and most combinations of PED (N, E). The effect was also reflected in the higher percentage of QTL matching those of the reference population for LOD 2.5 as compared to LOD 3.21 (Table 6). This corroborates results described by ![]()
![]()
The choice of significance threshold depends on the goals of the breeder and the cost of marker analyses. For construction of an ideal genotype a more conservative threshold should be chosen to minimize the risk of false positives. As shown here, for complex quantitative traits the number of putative QTL is very large, but due to the L-shaped distribution of detected QTL effects each additional marker linked to a putative QTL would produce diminishing returns but equal costs. Hence, even if a large population was available from which to select optimal genotypes, the optimal number of putative QTL to be used for genotype construction would be much smaller than the total number of QTL detected in a mapping experiment of reasonable sample size, especially when considering more than one trait simultaneously.
When constructing a selection index for combined marker-assisted and phenotypic selection, the type I error rate determines the stop criterion for including additional markers in the model. Depending on marker costs and the magnitude of the detected QTL effects, i.e., the genetic architecture of the trait, an optimum type I error rate should exist. If marker costs are neglected, the experimentwise type I error of Pe
0.35 used in this study for grain yield, grain moisture, and plant height yielded a higher efficiency of MAS than did the more conservative threshold of Pe
0.02. In this study, we cannot draw conclusions about the optimum type I error rate to be used for the construction of a selection index, because only two different significance thresholds have been used. However, we believe that the choice of type I error rate warrants further research.
Variation of cross-validation-derived estimates of pTS and bias:
As pointed out earlier, estimates of h2DS,
TS, and the bias varied tremendously among DS. Most authors use the coefficient of determination from regression R2 or R2adj (![]()
. Because
2adj and
2DS are both subject to sampling errors, estimates of pTS beyond theoretical boundaries [0, 100] can occur. To warrant accurate estimation of
TS, values of
TS exceeding theoretical boundaries were accepted (![]()
TS within theoretical boundaries [0, 100] in all DS when the population size was low (N = 122). If individual DS with extreme estimated values of
TS occur, the mean over all DS (
TS) is affected and the distribution of
TS is skewed toward larger values. This circumstance is reflected in the difference between the mean (
TS) and the median (
TS) proportion of genotypic variance explained by QTL for grain yield and small N and E (Fig 2). On the basis of these findings we conclude that cross-validation yields best results when a minimum sample size (N > 200) and a minimum number of test environments (E > 4) are available for analysis. Hence, QTL experiments need to be designed with special consideration of the population size and the number of test environments depending on the genetic architecture and the heritability of the trait.
The sampling error of the heritability estimates also affected estimates of the magnitude of the bias. Since the same estimate of h2DS is used for calculation of
ES and
TS, the difference between the genotypic variance explained in ES as compared to TS was also inflated for DS with low h2DS estimates, but the effect was not as pronounced as for estimates of pTS. As expected, the population size had an effect not only on the average bias but also on the range of the bias for different DS (Fig 3). This needs to be taken into account when evaluating the prospects of MAS. When MAS for a specific trait is tested, the success of MAS is predicted on the basis of results from a QTL mapping study generally performed with one or few segregating populations and compared to the actual selection gain achieved in another independent population from the same or a different cross subjected to MAS. The difference between the predicted and the realized selection gain corresponds to the difference between
ES and
TS in this study and therefore must be attributed to the bias of estimating the genotypic variance explained in the QTL mapping experiment. As can be seen in Fig 3, there are quite a few DS with bias of 100%, where QTL explaining a large proportion of the genotypic variance could be identified in ES but no gain from selection would be realized in the TS sample. This was especially pronounced for grain yield. In other DS, however, a considerable proportion of the variance explained by markers in ES could also be found in TS and would result in gain from selection if the TS had been used for MAS. With LOD 3.21 there were quite a few DS with zero bias for all traits, indicating that the entire genotypic variance explained in the mapping experiment would be useful in selection. This has to be interpreted with caution, however, because it could be the result of ES with 0 QTL detected. With LOD 2.5 and grain moisture and plant height it can be seen that the minimum bias for most combinations of N and E was
10%. This means that, depending on the sample used for mapping, the maximum amount of genotypic variance explained by markers usable for selection would be only 10% less than the genotypic variance explained in the mapping experiment. These findings explain the controversial results from published experiments on MAS. Those MAS experiments that were successful might have estimated QTL effects with little or no bias in the mapping experiment. Those that were not successful might have had biased QTL estimates or many false positive QTL due to sampling. Increasing N and E helps to reduce the average bias and its range and, thus, provides a more realistic assessment of the prospects of MAS. Moreover, it also increases
TS and consequently the efficiency of MAS as compared to phenotypic selection.
Choice of resampling method:
The necessity for correction of bias in estimates of QTL effects has been shown in this study and has been pointed out by several researchers performing simulation studies on QTL mapping and the relative efficiency of MAS as compared to phenotypic selection. ![]()
![]()
TS and
SD:ES indicated that standard CV yielded almost unbiased estimates of the true genotypic variance explained by QTL (p) even for moderate sample sizes. Standard fivefold CV showed slightly lower power for QTL detection in ES as compared to DS due to the 20% fewer individuals used for QTL estimation but on average the loss of power in QTL estimation from DS to ES was reflected in only a slight underestimation of
SD:DS by
TS. In CV/GE the underestimation of the true genotypic variance explained by QTL was more pronounced, especially for small N and E. This can be attributable to QTL x environment interactions. The sum of all estimated QTL x environment interaction effects adds up to zero and, therefore, these estimates are not stochastically independent. With few test environments, the correlation between effects cannot be ignored.
A comparison of our results from CV with other resampling methods yielded similar findings (![]()
![]()
![]()
![]()
Recommendations for QTL mapping experiments:
Our results from cross-validation of experimental data agreed well with the results from simulation experiments on MAS (![]()
![]()
The proportion of genotypic variance that can be explained by markers is trait specific. On the basis of our results, some general recommendations can be given for QTL mapping studies to maximize the proportion of genotypic variance explained:
- With limited resources, adding more genotypes is more efficient than replicating the same genotypes.
- Depending on the trait of interest, a minimum number of environments is necessary (E > 4 with unreplicated trials) to allow for reliable estimation of h2.
- Optimization of the population size to be used in QTL mapping experiments has to be trait specific depending on the genetic architecture and plot heritability of the trait as well as the total resources available. Increasing population size from 488 to 976 was beneficial and had a relatively large effect on the amount of genotypic variance explained, although the impact of increasing the population size was more dramatic for smaller N.
- The choice of significance threshold depends on the population size and also on the trait of interest. If the aim of a study is to identify the few large QTL regulating a limited proportion of the genetic variance, a more conservative threshold is recommended because the frequency of QTL detection is correlated with the size of QTL effects and the reliability of finding the large QTL is improved.
For traits regulated by a few QTL with large (>0.2
P) effects, for which phenotypic selection is expensive or hampered due to rare occurrence in the field, MAS can be efficiently used. The finesse of the breeder will be to find the optimum allocation of resources for detecting QTL and to obtain a realistic assessment of the genotypic variance explained by them for combining MAS with phenotypic selection.
Manuscript received June 9, 2003; Accepted for publication October 31, 2003.
| LITERATURE CITED |
|---|
ALLISON, D. B., J. R. FERNANDEZ, H. MOONSEONG, Z. SHANKUAN, and C. ETZEL et al., 2002 Bias in estimates of quantitative-trait-locus effect in genome scans: demonstration of the phenomenon and a method-of-moments procedure for reducing bias. Am. J. Hum. Genet. 70:575-585.[CrossRef][Medline]
BEAVIS, W. D., 1994 The power and deceit of QTL experiments: lessons from comparative QTL studies. 49th Annual Corn and Sorghum Industry Research Conference. ASTA, Washington, DC, pp. 250266.
BEAVIS, W. D., 1998 QTL analyses: power, precision and accuracy, pp. 145162 in Molecular Dissection of Complex Traits, edited by A. H. PATERSON. CRC Press, Boca Raton, FL.
BENNEWITZ, J., N. REINSCH, and E. KALM, 2002 Improved confidence intervals in quantitative trait loci mapping by permutation bootstrapping. Genetics 160:1673-1686.
BOST, B., D. DE VIENNE, F. HOSPITAL, L. MOREAU, and C. DILLMANN, 2001 Genetic and nongenetic bases for the L-shaped distribution of quantitative trait loci effects. Genetics 157:1773-1787.
BOUCHEZ, A., F. HOSPITAL, M. CAUSSE, A. GALLAIS, and A. CHARCOSSET, 2002 Marker-assisted introgression of favorable alleles at quantitative trait loci between maize elite lines. Genetics 162:1945-1959.
BREIMAN, L. and P. SPECTOR, 1992 Submodel selection and evaluation in regression. The X-random case. Int. Stat. Rev. 60:291-319.
DOERGE, R. W. and G. A. CHURCHILL, 1996 Permutation tests for multiple loci affecting a quantitative character. Genetics 142:285-294.[Abstract]
DRAPER, N. R., and H. SMITH, 1981 Applied Regression Analysis, Ed. 2. Wiley & Sons, New York.
EFRON, B., 1983 Estimating the error rate of a prediction rule: improvement on cross-validation. J. Am. Stat. Assoc. 78(382):316-330.[CrossRef]
EFRON, B., and R. J. TIBSHIRANI, 1993 An Introduction to the Bootstrap. Chapman & Hall, New York.
FALCONER, D. S., and T. F. C. MACKAY, 1996 Introduction to Quantitative Genetics. Longman, London.
FISHER, R. A., 1918 The correlation between relatives on the supposition of Mendelian inheritance. Trans. R. Soc. Edinb. 52:399-433.
FRIDMAN, E., T. PLEBAN, and D. ZAMIR, 2000 A recombination hotspot delimits a wild-species quantitative trait locus for tomato sugar content to 484 bp within an invertase gene. Proc. Natl. Acad. Sci. USA 97(9):4718-4723.
FRISCH, M., M. BOHN, and A. E. MELCHINGER, 2000 Plabsim: software for simulation of marker-assisted backcrossing. J. Hered. 91:86-87.
GÖRING, H. H. H., J. D. TERWILLIGER, and J. BLANGERO, 2001 Large upward bias in estimation of locus-specific effects from genomewide scans. Am. J. Hum. Genet. 69:1357-1369.[CrossRef][Medline]
HALEY, C. S. and S. A. KNOTT, 1992 A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69:315-324.[Medline]
HJORTH, J. S.U., 1994 Computer Intensive Statistical Methods. Validation Model Selection and Bootstrap. Chapman & Hall, London.
HOLLOWAY, J. L., and S. J. KNAPP, 1993 G-MENDEL 3.0 Software for the Analysis of Genetic Markers and Maps. Oregon State University, Corvallis, OR.
HOLM, S., 1979 A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6:65-70.





