Linkage disequilibrium can be used for identifying associations between traits of interest and genetic markers. This study used mapped diversity array technology (DArT) markers to find associations with resistance to stem rust, leaf rust, yellow rust, and powdery mildew, plus grain yield in five historical wheat international multienvironment trials from the International Maize and Wheat Improvement Center (CIMMYT). Two linear mixed models were used to assess marker–trait associations incorporating information on population structure and covariance between relatives. An integrated map containing 813 DArT markers and 831 other markers was constructed. Several linkage disequilibrium clusters bearing multiple host plant resistance genes were found. Most of the associated markers were found in genomic regions where previous reports had found genes or quantitative trait loci (QTL) influencing the same traits, providing an independent validation of this approach. In addition, many new chromosome regions for disease resistance and grain yield were identified in the wheat genome. Phenotyping across up to 60 environments and years allowed modeling of genotype × environment interaction, thereby making possible the identification of markers contributing to both additive and additive × additive interaction effects of traits.
A useful new tool for crop genetic improvement is the identification of polymorphic markers associated with phenotypic variation for important traits by means of linkage disequilibrium (LD) between loci (Thornsberry et al. 2001; Flint-Garcia et al. 2003). A major advantage of this approach over conventional linkage mapping is that it does not require the time-consuming and expensive generation of specific genetic populations. LD is determined by the physical distance of the loci across chromosomes and has proven useful for dissecting complex traits because it offers fine-scale mapping due to the inclusion of historical recombination (Lynch and Walsh 1998). However, false positive correlation between markers and traits can arise in the absence of physical proximity due to population structure caused by admixture, mating system, and genetic drift or by artificial or natural selection during evolution, domestication, or plant improvement (Jannink and Walsh 2002). False associations can also be caused by alleles occurring at very low frequencies in the initial population (Breseghello and Sorrells 2006a,b). These factors create LD between loci that are not physically linked and cause a high rate of false positives when relating polymorphic markers to phenotypic trait variation. Thus, separating LD due to physical linkage from LD due to population structure is a critical prerequisite in association analyses.
Population structure can be quantified using Bayesian analysis, which has been effective for assigning individuals to subpopulations (Q matrix) using unlinked markers (Pritchard et al. 2000). Other multivariate statistical analyses such as classification (clustering) and ordination (scaling) can also be used to account for population structure (Kraakman et al. 2004). Population structure in modern breeding populations can be caused by complex pedigrees derived from crosses of parents with different levels of relatedness. Relationships between individuals can be detected by means of (1) marker-based estimation of the probability of identity by descent between individuals, (2) coefficient of parentage (COP) that measures the covariance between related individuals in a population (Parisseaux and Bernardo 2004), and (3) both types of analysis simultaneously (Arbelbide and Bernardo 2006; Malosetti et al. 2007). Yu et al. (2006) incorporated the outcome of population structure (Q matrix), with the estimation of relatedness between individuals obtained through the marker-based kinship matrix (K), into a unified linear mixed-model approach. Yu et al. (2006) showed that this approach effectively decreases type I error rates (false positives) and increases the power of the marker–trait association tests.
Successful application of association analysis requires comprehensive phenotypic data for modeling genotype × environment interaction (GE). However, an advantage of association analysis is that it can be based on historical phenotypic data from breeding trials, which in the case of the International Maize and Wheat Improvement Center (CIMMYT)'s wheat breeding programs is in plentiful supply. Linear mixed-model methodology applied to plant phenotypic data allows accurate prediction of genotypic performance using covariance structures that consider genetic associations between relatives included in the experiment. Using linear mixed-model methodology, a genetic covariance matrix can be estimated, and best linear unbiased predictors (BLUPs) can be obtained. BLUPs for simultaneously modeling GE and incorporating information on additive genetic variation have been previously suggested by Crossa et al. (2006) and Burgueño et al. (2007).
The CIMMYT wheat breeding program concentrates on producing stable high-yielding and widely adapted advanced breeding lines (Braun et al. 1996; Rajaram et al. 1996). CIMMYT bread wheat (Triticum aestivum L.) germplasm is known to contain various photoperiod insensitivity genes, dwarfing genes, and rust resistance genes (Trethowan et al. 2007). Grain yield and resistance to diseases, particularly to the rusts, are primary selection criteria, and the generation of closely linked molecular markers would speed up the breeding progress. CIMMYT's wheat breeding program annually distributes improved germplasm to a network of wheat research and breeding cooperators across the world who evaluate this material in many different environments and agronomic conditions. Phenotypic data from these trials have been cataloged, analyzed, and made available to the network. The data are used at CIMMYT to identify parents for future crosses and to drive the strategic incorporation of new genetic variability into advanced lines that are consequently able to cope with the dynamics of abiotic and biotic stresses.
In this study, we investigate the association of 242 diversity array technology (DArT) markers with resistance to stem (black) rust (caused by Puccinia graminis f.sp. tritici), leaf (brown) rust (caused by P. triticina), yellow (stripe) rust (caused by P. striiformis), powdery mildew (caused by Blumeria graminis f.sp. tritici), and grain yield in five historical CIMMYT elite spring wheat yield trials (ESWYT) conducted from 1979 to 2004 with data collected from up to 60 international environments. Two linear mixed models were used to assess marker–trait associations incorporating information on population structure and additive genetic covariance between relatives.
MATERIALS AND METHODS
Association populations and phenotypic data:
A total of 170 wheat lines derived from five CIMMYT elite spring wheat yield trials (ESWYT 1, ESWYT 6, ESWYT 10, ESWYT 20, and ESWYT 24) from 1979, 1984, 1988, 1999, and 2004, respectively, were included in this study. The data in each ESWYT were usually balanced, although for some traits data were recorded only from one replicate in some locations. There were no lines in common among the ESWYTs selected for this study, except for Genaro T81, which was included in ESWYTs 1 and 6. Depending on the trait and the specific ESWYT, the number of sites used for the various statistical analyses ranged from 6 to 60. Five traits were considered in this study: response to stem rust (SR) (percentage of infection), leaf rust (LR) (percentage of infection), yellow rust (YR) (percentage of infection), and powdery mildew (PM) (scored on a scale of 1–9) and grain yield (GY in tons per hectare). Association analyses were first performed for each of the five ESWYTs individually, and then the five ESWYTs were grouped into two sets related by year. ESWYTs 1, 6, and 10, which included 76 lines, formed ESWYT set 1, and ESWYTs 20 and 24, with 94 lines, formed ESWYT set 2. Unadjusted means of the five traits for each of the 170 wheat lines, their selection identifiers (SID), cross identifier (CID), line names, and ESWYT number are given in the additional information in supplemental Table S1 at http://www.genetics.org/supplemental/.
Depending on the trait and set, different sites, countries, and site–country combinations were used for the analyses. All five traits were recorded in all five ESWYTs in at least six sites, except SR, which was recorded only in sites of ESWYTs 1, 6, and 24.
DArT assay and map construction:
DArT markers were generated by Triticarte Pty. Ltd. (Canberra, Australia; http://www.triticarte.com.au), a whole-genome profiling service laboratory, as described by Akbari et al. (2006). The locus designations used by Triticarte Pty. Ltd. were adopted in this article, and DArT markers were named using the prefix “wPt,” followed by a unique numerical identifier. An integrated map of the DArT markers was developed using previous reported maps from diverse segregating populations, including: (1) Cranbrook × Halberd, 339 DArT (Akbari et al. 2006); (2) Arina × NK93604, 189 DArT (Semagn et al. 2006); (3) Avocet × Saar, 112 DArT (M. Lillemo, unpublished data); and (4) selected linkage groups of markers from nine different populations published by Triticarte Pty. Ltd. (http://www.triticarte.com.au). The integrated map was constructed with the software package CMTV (Sawkins et al. 2004); it included a total of 1644 markers, 813 of which were DArT markers and 831 of which were markers based on simple sequence repeats (SSRs), amplified fragment length polymorphisms (AFLPs), or restriction fragment length polymorphisms (RFLPs). Of a total of 813 DArT markers, 318 were polymorphic in the lines included in this study, and 242 of these were located on the final integrated map. A total of 76 polymorphic DArT markers could not be integrated into the final map (these are hereafter referred to as unmapped markers).
Possible population substructuring was investigated using the program STRUCTURE (Pritchard et al. 2000), on the basis of a set of 45 unlinked mapped DArT markers distributed evenly across the genome. Models with a putative number of subpopulations (K) from 1 to 20 without admixture and with noncorrelated allele frequencies were considered for each ESWYT. For the two ESWYT sets, K was set between 1 and 30. Lines within a subpopulation were assumed to be in Hardy–Weinberg equilibrium, as well as in linkage equilibrium (LE). In addition, the neighbor-joining clustering algorithm was run on the 170 lines (within ESWYTs and within combined ESWYT sets), using the Jaccard similarity coefficient as a proximity matrix with all 318 polymorphic DArT markers and the software DARWIN (Perrier et al. 2003). This step was undertaken to confirm the results of the population structure obtained from STRUCTURE.
Estimating LD between markers measures whether markers segregate independently or not. The program TASSEL (http://www.maizegenetics.net) was used to estimate the LD parameter r2 among loci, and the comparisonwise significance was computed with 1000 permutations. The r2 parameter was estimated for unlinked loci and for loci on the same chromosome. The latter was plotted for each ESWYT against genetic distance measured in centimorgans. A critical value for r2, as evidence of linkage, was derived using the 95% percentile of unlinked loci according to Breseghello and Sorrells (2006a). If, within a chromosome region, all pairs of adjacent loci were in LD, this region was referred to as an LD block (LDb).
The COP matrix:
The COP between individuals i and i′ is the probability that an allele from a randomly selected locus in individual i is identical by descent with an allele randomly selected from the same locus in individual i′ (Cockerham 1971). The genetic covariance between relatives due to their additive genetic effects is equal to two times the coefficient of parentage (Kempthorne 1969) (COP = fii′) times the additive genetic variance; i.e., , where A is the additive relationship matrix and is the additive genetic variance. In the case of self-pollinating species with successive generations of inbreeding, COP between sister lines decreases with increasing numbers of generations following the split from the common ancestor. The Browse application of the International Crop Information System (ICIS), as described in http://cropwiki.irri.org/icis/index.php/TDM_GMS_Browse (McLaren et al. 2005), was used for deriving the relationship matrix; this application accounts for selection as well as inbreeding and improves the accuracy of breeding value estimation. The different ESWYTs had different numbers of sister line groups (ranging from 2 to 5). Matrix fii′ was computed for the 76 lines included in ESWYT set 1 and for the 96 lines of ESWYT set 2.
Linear mixed models used for association analyses:
The linear mixed model (LMM) was used to control the broad level of population structure inferred from Pritchard's STRUCTURE using the Q matrix and the level of relatedness between wheat lines using the relationship A matrix. Two kinds of linear mixed models (LMM1 and LMM2) were used to fit data from g lines, s sites, and r replicates (at each site) for modeling association of phenotypic traits with m markers. LMM1 is similar to some of the models proposed by Smith et al. (2002) for modeling GE and by Crossa et al. (2006) and Burgueño et al. (2007) for modeling GE with matrix A. In LMM2, markers are included as covariates for modeling the fixed effects of marker × environment interaction in a manner similar to the partition proposed in the factorial regression model (van Eeuwijk et al. 1996).
Linear mixed model 1 with covariance between relatives for modeling line and line × environment interaction:
Linear mixed model 1 (LMM1), which uses the relationship of the lines measured by the g × g A matrix for modeling GE, is(1)where is the design matrix relating Y to the fixed effects of sites (b), and and are the design matrices relating Y to the random effects of replicates within sites (r) and lines within sites (g), respectively. Vectors r, g, and e are assumed to be normally distributed, with zero mean vectors and variance–covariance (VCV) matrices R, G, and E, respectively. The VCV matrix G, which combines the main effect of lines and GE, can be represented as , where ⊗ is the Kronecker product operator, and the jth diagonal element of the matrix is the additive genetic variance within the jth site, and the jjth element is the additive genetic covariance between sites j and j′; thus, is the correlation of additive genetic effects between sites j and j′. The VCV matrix G was modeled using the factor analytical model (Smith et al. 2002; Crossa et al. 2006; Burgueño et al. 2007).
For each DArT marker, BLUPs of the lines obtained were used for contrasts of the predictable function of no difference between BLUPs with the mth DArT marker = 1 and BLUPs of lines with the mth DArT marker = 0. This was done using the VCV matrix of the BLUPs of lines obtained from LMM1. The contrasts between BLUPs were calculated by weighting each line with its probability of membership of each subpopulation obtained from matrix Q. The contrasts were performed for each marker at each subpopulation, as well as for each marker over all subpopulations.
A variant of LMM1 was proposed by Burgueño et al. (2007) for partitioning the total genetic effect (g) of a line into additive (a) and additive × additive (i) effects, as in Oakey et al. (2006). The covariance between individuals due to additive × additive effects in matrix notation is (where # is the elementwise multiplication operator). This model (with the partition of g into a and i) was used only for GY in each individual ESWYT and in ESWYT set 2. Thus, results from these trials were presented in terms of GY, GYA, and GYAA, which refer to the total genetic effect, the additive effects, and the additive × additive effects, respectively. Highly unbalanced data for other traits in ESWYT set 1 prevented fitting LMM1 with g = a + i.
Linear mixed model 2 with covariance between relatives for modeling the polygenic effects of lines and including marker × environment interaction as fixed effects:
The best linear unbiased estimates (BLUEs) obtained from the individual-site analysis and their weights were used in the second-stage across-site analysis to calculate the BLUPs of the lines. The weights for the ith genotype in the jth location, wij, were computed as suggested by Cullis et al. (1996), , where rij is the number of replicates for the ith genotype in the jth environment, is the estimated error variance for the jth environment, and is the pool error variance. The second-stage across-site analysis uses linear mixed model 2 (LMM2) with random sites (s), random polygenic effects of lines (g), and fixed covariables from matrix Q of population structure and partitions the GE into marker × environment interactions as fixed covariable effects(2)where is the design matrix of the fixed effects of population structure (covariate) (q); and are the design matrices of the random effect of sites (s) and lines (g), and is the design matrix for the fixed effect of the marker × site interaction (me) for each of the m markers. The design matrix contains the allelic values of the markers, which for DArT markers are 0 and 1 (or −1 and 1). Vector g is assumed to be normally distributed with zero mean and VCV matrix ; vectors of random effects of sites (b) and residual error (e) are assumed to have a simple VCV.
Predictable functions of the contrast of the BLUPs of lines:
For the case of association analysis, for a given marker m, the null hypothesis of no difference between BLUPs of the lines with the mth DArT marker = 1 and BLUPs of the lines with the mth DArT marker = 0 was tested using a generalized t-statistic. The Wald z-statistics were also used for a simple contrast or Wald F-statistics for multiple contrasts. Asymptotically, the Wald statistics follow a chi-squared distribution.
Controlling the rate of false positives and the false discovery rate:
Critical P-values for assessing the significance of the null hypothesis were calculated using the Bonferroni genomewise error rate protection against rejecting a true null hypothesis (lower type I error rate). The testwise significance level was divided by the number of tests (markers). The control of the false discovery rate (FDR) for the test of each marker (Benjamini and Hochberg 1995) was used as an independent alternative method.
Variance component estimates are given in Table 1 for five traits in each individual ESWYT and in ESWYT sets 1 and 2. Site variability was important, but genetic as well as GE components were also of considerable size. The additive and additive × additive components and their interaction with sites for GY in ESWYTs 20 and 24, as well as in ESWYT set 2, were greater than those in ESWYTs 1, 6, and 10, probably due to the larger number of sites included in ESWYT set 2.
LD analyses were performed for each ESWYT individually, for ESWYT sets 1 and 2, and for both sets combined. The LD patterns were distinct for each individual ESWYT; however, for ESWYT set 1, ESWYT set 2, and the combined sets, the LD patterns were very similar. As LD breakdown depends on the number and relatedness of the lines under study, this result was not unexpected. The percentage of physically linked loci pairs located in significant LD blocks was 26.2% in ESWYT set 1 (r2 > 0.124, P < 0.01), 25.3% in ESWYT set 2 (r2 > 0.182, P < 0.01), and 27.6% in both sets combined (r2 > 0.115, P < 0.01). Pairwise r2 estimates were on average 0.169, 0.196, and 0.169 for ESWYT set 1, ESWYT set 2, and both sets combined, respectively. The percentages of unlinked pairs of loci that were in significant LD were 4.9% in ESWYT set 1, 5.0% in ESWYT set 2, and 5.0% in both sets combined. The estimated averages of r2 for unlinked loci were 0.03 in ESWYT set 1, 0.05 in ESWYT set 2, and 0.03 in both sets combined.
As depicted in Figure 1a for the LD of ESWYT set 1 and in Figure 1b for that of ESWYT set 2, LD decreases with increasing genetic map distance between marker loci. In all three cases, the r2 declines within ∼40 cM below critical values (indicating no linkage disequilibrium). This occurs at a larger distance than in previously reported studies of wheat (Breseghello and Sorrells 2006a), which focused on SSR markers across two chromosomes only. Since LD blocks in the seven chromosomal groups for each ESWYT set and in both sets combined were very similar, the LD blocks from both sets combined were adopted in the integrated map. A total of 43 LD blocks, with an average length of 9.93 cM, were detected across all chromosome groups. The longest LD blocks, with approximate lengths of 64.96 and 87.41 cM, respectively, were observed on chromosomes 1B and 4B.
The integrated maps with 1644 markers (813 DArT's and 831 SSRs, AFLPs, or RFLPs) are displayed in supplemental Figures S1–S7 for chromosome groups 1–7, respectively, at http://www.genetics.org/supplemental/. Of the 813 DArT markers, only 318 were polymorphic in the 170 lines. Of these, 242 were located on the final integrated map, with 79, 149, and 20 DArT markers in genomes A, B, and D, respectively. No marker could be located on chromosome 6D due to low numbers of polymorphic DArT's per se and to insufficient anchor markers among the segregating populations utilized for map development. Polymorphic markers on the remaining chromosomes ranged from 2 DArT's on chromosome 5D to 20 DArT's on chromosome 7B. The power of testing marker–trait associations is reduced using markers with low frequencies; however, this was not a problem in this study, as the average frequency of DArT polymorphisms was 0.57. A frequency <5% for the absent score (“zero” allele) was observed for only one marker (wpt9124). For graphical simplicity we present the reduced maps in Figure 2, a–g, where a subset of the 831 SSR, AFLP, and RFLP markers and all 242 DArT markers corresponding to each chromosome group 1–7, respectively, are included.
Population structure was determined for each ESWYT individually. Model-based clustering analyses using the program STRUCTURE and the neighbor-joining clustering algorithm showed a pattern of substructure for ESWYT 20 only, which had seven subpopulations on the basis of the log-likelihood profile. Subsets of sister lines included in ESWYT 20 were assigned to the same subpopulation, except for the first subpopulation where three sister lines from different subsets were clustered. The STRUCTURE and neighbor-joining clustering algorithm analyses for the rest of the ESWYTs taken individually showed a pattern typical of unstructured populations. Taken as a set, the 76 lines contained in ESWYT set 1 were grouped into 16 subpopulations, while the 94 lines in ESWYT set 2 were grouped into 15 subpopulations (data not shown).
Association analyses of phenotypic traits:
Significant markers for each individual ESWYT and trait analyses are presented for LMM1 (supplemental Table S2 at http://www.genetics.org/supplemental/) and LMM2 (supplemental Table S3 at http://www.genetics.org/supplemental/) (the effect of each marker is given only for LMM1). The success in fitting the combined data for each ESWYT set depended on the trait, the number of common sites, countries, and site–country combinations and the LMM considered. For SR, results for ESWYT set 1 included ESWYTs 1 and 6, whereas results from set 2 included only ESWYT 24. For ESWYT set 1, LMM1 was fitted to GY but the model with the partition of g into a and i could not be fitted. For ESWYT set 1, PM was not included because no common sites or site–country combinations were found.
Several DArT markers on different chromosomes were significant for various traits simultaneously, and many unmapped DArT markers were significant as well. Only the results of association analysis for ESWYT sets 1 and 2 and for both LMMs are shown. To facilitate description and interpretation of the results, reference to numbered LDb's containing markers in LD is frequently made. The LDb's are highlighted in Figure 2, a–g.
Significant DArT markers for traits:
The mapped and unmapped DArT markers significantly associated with LR, SR, PM, YR, and GY (including GYA and GYAA) for ESWYT sets 1 and 2 and for LMM1 and LMM2 are summarized in Tables 2–6⇓⇓⇓⇓. In addition, we present the most recently published information on the chromosomal locations of genes and quantitative trait loci (QTL) for LR, SR, PM, and YR resistance and for GY, including genes for photoperiod insensitivity (Ppd), vernalization (Vrn), earliness per se (Eps), and dwarfism (Rht). Tables 2–6⇓⇓⇓⇓ summarize the information depicted in Figure 2, a–g. Significant DArT markers for YR and GY are widely spread across all chromosomes, with a total of 122 and 213 significantly associated DArT markers, respectively, vs. 87, 63, and 61 significant DArT markers for LR, SR, and PM, respectively. Chromosome groups 4 and 5 had the lowest number of significant DArT markers across all traits (47 and 35, respectively), whereas chromosome groups 1, 2, 3, 6, and 7 had a total of 132, 64, 99, 85, and 83 significant DArT markers, respectively. A comparison among the A, B, and D genomes is of less interest due to the drastically different numbers of DArT markers that were polymorphic within each genome.
Association analysis for response to LR:
More than 50 LR resistance genes have been cataloged in bread wheat; some of them have been reported or are known to be present in CIMMYT germplasm, including Lr1, -3, -3bg, -10, -13, -14a, -16, -17a, -19, -21, -23, -24, -26, -34, -42, and -46 and the complementary genes Lr27 and Lr31 (Singh and Mcintosh 1984; Singh and Gupta 1991; Singh and Rajaram 1991; Sayre et al. 1998; Singh et al. 2005) (Table 2). Other known LR genes not known to be present in any of the lines included in this study are listed in Table 2. An objective of the CIMMYT wheat breeding program is to incorporate slow rusting resistance to LR and YR into breeding lines to assure long-term control of these diseases (Rajaram et al. 1988; Singh et al. 2005). Slow rusting resistance is conferred by Lr34, Lr46, and other genes with minor additive effects (Singh et al. 2005). Several studies have identified QTL associated with slow rusting resistance on other chromosomes (such as 7BL) (Table 2).
In this study, many of the DArT markers significantly associated with LR were in the same chromosome regions as previously reported LR resistance genes or previously reported QTL (Table 2). There were also regions accounting for LR resistance variation in which there were no previously known genes or QTL, such as 1AL, 1DL (Figure 2a), 5AS, 5DS (Figure 2e), 6AS, and 6AL (Figure 2f) (Table 2). Chromosome 1B had the largest number of significant markers for LR; some of them may be associated with the reported QTL and genes Lr26 and Lr46. The 1B.1R translocation includes Lr26, YR resistance gene Yr9, SR resistance gene Sr31, and PM resistance gene Pm8; it has also shown an effect on GY (and its components) in several spring wheat cultivars (i.e., Seri M82, Genaro T81, Ures T81, and Veery-related lines) included in this study (Villareal et al. 1991; Singh et al. 1998). Gene Lr46 [pleiotropic or closely linked to Yr29 (William et al. 2003) and associated with PM (Lillemo et al. 2007)] is present in cultivar Pavon 76 (included in this study) and associated with SSR markers gwm259, wmc44, wmc367, and barc80 on 1BL (William et al. 2006). SSR gwm259 is not far from DArT markers wPt0944, wPt2526, and wPt4129 (Figure 2a), which are significant for LR.
Other significant associations include the one between DArT wPt0100 and LR, which appears to relate to Lr16 (Table 2) within the LDb2 of 2BS. This Lr gene has been reported to be linked to wmc661 (McCartney et al. 2005a) and is 1.1 cM from wPt0100 (Figure 2b). Four DArT markers were significantly associated with LR in 7BL (Figure 2g, Table 2) and could be detecting Lr14a present in cultivar Esmeralda M86. In 7DS, the DArT wPt3328 is most likely detecting the slow rusting resistance gene Lr34 that is also linked or pleiotropic to Yr18. Gene Lr34 was present in 60% of CIMMYT bread wheat germplasm in the 1990s (Rajaram et al. 1997) and in several lines of this study (supplemental Table S1 at http://www.genetics.org/supplemental/). Gene Lr19 (linked to Sr25), which is included in a Lophopyrum (Thinopyrum) translocation segment in 7DL was probably not detected due to lack of mapped DArT markers in that arm.
Association analysis for response to SR:
The use of lines with the “Sr2 complex” (Sr2 gene together with other unknown minor genes) in the 1950s provided the foundation for durable resistance to SR (Ortiz et al. 2007a). Of almost 50 known SR resistance genes, Sr2 is the only one conferring slow rusting resistance and is present in Pavon F76 and Veery-related lines (supplemental Table S1 at http://www.genetics.org/supplemental/). Few reports are available on the detection of QTL associated with SR resistance (Table 3).
In this study, several DArT markers were found to be significantly associated with SR resistance (Table 3, Figure 2, a–g). Many map to locations reported from previous linkage mapping studies. In several regions (4BS, 5BL, and 6BS), DArT markers seem to be detecting undesignated SR resistance genes or QTL (Table 3). Several DArT markers in 1BS are most likely associated with the presence of Sr31 of the 1B.1R translocation, which is present in the Veery-related lines included in this study. In 2BS, wPt0100 was found to be associated with SR, GY, LR, and YR in ESWYT set 2, which agrees with the location of Sr23 and is tightly linked to Lr16. Several significant associations for SR were found on chromosome 3BS, where Sr12 and Sr2 are located. Sr2, linked to Lr27, is associated with SSR markers gwm389 and gwm533 (Spielmeyer et al. 2003) and distally located on the short arm where no DArT markers were mapped. It is, however, possible that the DArT markers wPt0365 and wPt0995 are associated with this gene (Table 3, Figure 2c). On chromosome 4AL, three DArT markers in LDb1 and one in LDb2 were associated with SR and could be associated with the presence of Sr7 (Figure 2d).
At the end of 6BL, three DArT markers were associated with SR in ESWYT set 1; they are most likely linked to Sr11, a gene that is in turn linked to Lr3 (McIntosh et al. 1995) that is present in a large number of CIMMYT cultivars (Roelfs and McVey 1979), but to which virulence has evolved in most countries (this is probably why it was not found in ESWYT set 2) (McIntosh et al. 1995). Marker wPt0600 in 7BL could be associated with Sr17, which is found in a wide range of cultivars, including CIMMYT wheat lines, and which is linked to Lr14a and Pm5. Marker wPt3328 in 7DS, corresponding to the position of Lr34/Yr18 (Kerber and Aung 1999), was not significant for SR in this study, despite the fact that Lr34 has previously been associated with enhanced SR resistance.
Association analysis for response to yellow rust (YR):
According to Suenaga et al. (2003), there is significant diversity for genes that have minor to intermediate additive effects on YR resistance. Adult plant resistance to YR and LR in CIMMYT wheat is based on several genes with minor effects, including Yr18 (pleiotropic or closely linked to Lr34), Yr29 (pleiotropic or closely linked to Lr46), and Yr30, which have been deployed to provide a more durable solution against YR (Singh et al. 2005). In addition, of the known seedling YR resistance genes, those reported in CIMMYT wheat lines are, for example, Yr9 (in the 1B.1R translocation), Yr3, Yr1, Yr2, Yr27, Yr7, and Yr6 (McIntosh et al. 1995). In this study, association with YR was found in all chromosomes where DArT markers have been mapped, except at 5D (Table 4). Several significant DArT markers were found in regions where no YR resistance genes or QTL had been reported previously: i.e., 1AS, 1AL, 1DS, 1DL, 2AL, 3AS, 3AL, 4AL, 4BS, 5BS, 6AS, 6AL, 6BL, 7AS, and 7AL (Table 4).
Some significant DArT markers proved to be associated with known YR resistance genes, such as Yr24 at 1B, which is allelic to Yr26 and closely linked to barc187 (Li et al. 2006) (Figure 2a). Boukhatem et al. (2002) identified two major QTL for durable YR resistance on chromosome 2B, of which one was present in Opata 85 and mapped near the SSR marker gwm410. This marker is also present in LDb3 with wPt7757, which is significantly associated with YR (Figure 2b, Table 4). On 2DL, a QTL associated with a reduction in YR infection type was reported to be linked to marker gwm349 (Mallard et al. 2005), which in our study is ∼3 cM from wPt4413 (Figure 2b, Table 4). In addition, Suenaga et al. (2003) detected a QTL associated with YR resistance close to marker gwm349 on 2DL (Figure 2b) within LDb6 near DArT marker wPt4413. Yr30, at 3BS, has been reported to be linked to Sr2 and Lr27, and to SSR marker gwm533 (Spielmeyer et al. 2003), and may be marked by wPt0995 and other DArT markers at LDb3 (Figure 2c) (it is present in Opata 85 and Pavon F76, included in this study). Singh et al. (2000) and Boukhatem et al. (2002) identified a region on chromosome 3DS associated with YR in Opata 85, which could have been identified by wPt1336 and wP19401.
A QTL for adult plant YR resistance in 4BL was linked to SSRs wmc48c and gwm513 (Suenaga et al. 2003). In our study, marker wPt6149 was significantly associated with YR and located between these two markers (Figure 2d). A QTL for adult plant YR resistance in 4D was associated with marker psp3103 (Suenaga et al. 2003), which is 8.4 cM from wPt4572 that is significant for YR (Figure 2d). The same authors identified a QTL for YR resistance in 5BL between SSRs gwm335 and gwm777, where several DArT markers were significantly associated with YR (Table 4, Figure 2e) (see supplemental Figure S15 at http://www.genetics.org/supplemental/ for the location of gwm777 on the map including all markers). The only reported gene on chromosome 6A is Yr38, which was recently transferred into wheat from Aegilops sharonensis (Marais et al. 2006). In contrast, this study found several linked and unlinked DArT markers significantly associated with YR on chromosome 6A in both ESWYT sets. These results indicate the presence of unknown YR resistance genes or alleles on this chromosome. On 7B, Suenaga et al. (2003) identified a QTL linked to SSRs gwm333 and gwm46, which in our study are located at LBb4 near wPt1149, wPt3833, and wPt6372, all of which were significantly associated, among others, to YR (Table 4, Figure 2g). At 7D, the adult plant resistance gene Yr18 is most likely being detected by DArT wPt3328 or wPt1100 (Figure 2g).
Association analysis for response to PM:
CIMMYT bread wheat germplasm is generally susceptible to PM due to the lack of natural epidemics of B. graminis f.sp. tritici in Mexico. Hence, there has not been direct selection for resistance against this pathogen in the breeding program. There is, however, considerable genetic variation for PM resistance in CIMMYT germplasm, as indicated by the many significant DArT markers associated with the trait in this study. Most of the significant chromosomal areas for PM resistance were associated with known Pm genes or QTL (Table 5). For example, the significant markers on chromosome arms 1BS, 1DL, 2BS, 3DS, 6AS, and 7BL correspond to the locations of Pm8, Pm24, Pm26, Pm13, Pm31, and Pm5, respectively, and the groups of significant markers for PM resistance on 1AL, 3BS, 4AL, 6BS, and 7DS correspond to reported QTL (Table 5). Recently, Nematollahi et al. (2007) detected PM resistance allele Pm5d at 7BL near SSRs gwm577 and wmc 581. These two SSRs are located in the same LDb6 of 7BL on our map (Figure 2g), where wPt6869, significant for PM, was found (and near wPt4300, significant for PM). There were also many significant DArT markers located in areas where no Pm gene or QTL had been reported previously, e.g., wPt2872 on 1AS, wPt4427-wPt7092 on 1DL, wPt4412 on 3BL, wPt9256 on 6BL, and wPt1085-wPt6869 on 7BL.
It is interesting to note that all the chromosomal areas that showed a significant effect on PM in this study also harbored genetic factors for resistance to at least one of the three rust diseases. This indicates that the PM resistance at these loci might have been indirectly selected for by selection for LR or YR. For example, the race-specific resistance gene Pm8 is co-inherited with Sr31, Lr29, and Yr9 in the 1B.1R translocation, and the Pm5 locus on 7BL is linked to Sr17 and Lr14. Such co-inheritance of resistance to PM and the rust diseases is also evident for race nonspecific adult plant resistance. For example, the Lr34/Yr18 locus on 7DS has been shown to confer partial resistance to PM (Spielmeyer et al. 2005; Lillemo et al. 2007), as does the Lr46/Yr29 locus on 1BL (Lillemo et al. 2007). These genes have been designated Pm38 and Pm39, respectively (R. A. McIntosh, personal communication). The DArT marker wPt3328 appears to be associated with Pm38, whereas this analysis failed to detect any effect of Pm39 on 1BL.
Association analysis for GY:
The genetics of yield and yield components in bread wheat are complex in nature and controlled by a large number of major and minor QTL (Kumar et al. 2007). Dispersion of genes affecting GY across the wheat genome appears to be a characteristic feature (Laperche et al. 2007) that was also observed in this study (Table 6). In this study, 213 DArT markers (or 88% of the mapped markers) were significantly associated with GY. About 58% of the significant DArT markers for GY mapped to the B genome, 31% to the A genome, and 11% to the D genome. A large proportion of significant DArT markers for GY were located on chromosomes 1B, 3B, and 6B, and no DArT markers on chromosome 4D were significantly associated with GY. Many of the DArT markers found to be significantly associated with GY map to regions previously reported for this trait (see discussion below). This study suggests that chromosomes 1AS, 1BS, 1BL, 1DS, 2AL, 3AL, 3BS, 5AS, 5DS, 6AS, 6AL, 6BS, 6BL, and 7DS (Table 6, Figure 2, a–g) contain regions affecting GY that have not been reported previously. Some of these regions may provide increases in GY due not only to additive effects but also to additive × additive interaction effects.
The results of this study, highlighting chromosomal regions affecting GY, are supported by results of previous studies, including the following:
One major allele for increased GY was identified on chromosome 2B near gwm410. This yield effect may be linked to a QTL for stay green found on 2B, which has been shown to improve yield under both drought stress and optimal conditions (Verma et al. 2004). In the present study, in ESWYT set 2, GY was significantly associated with markers wpt5672, wPt4125, and wPt7757 in LDb3 (where gwm410 is located) (Figure 2b).
Kumar et al. (2007) found consistent QTL on 2DS linked to SSR gwm261 for five yield traits. In our study, marker gwm261 was found in LDb6 of 2D at ∼2.20 cM from wPt4144, which was associated with yield in ESWYT set 1 (Figure 2b).
Dilbirligi et al. (2006) found that marker barc12 on chromosome arm 3AS was associated with GY. In our study, barc12 was positioned at 3AL but located near wPt4725 and wPt9262, which were significant for GY, YR, and LR. Also in the present study, five DArT markers in LDb1 on chromosome arm 3AS (Figure 2c) were associated with GY.
On chromosome 7, Kumar et al. (2007) identified two regions associated with GY at 7AS (interval gwm130–gwm1171) and at 7AL (interval gwm332–gwm698). The region at 7AS and marker gwm130 coincides with the location of wPt6034 and wPt8789 (LDb1), which were significant for GY, and the region in 7AL within LDb2 contained gwm698 and also wPt6495, which was significant for GY in ESWYT set 1 (Figure 2g).
A QTL for GY was mapped to 4AS near marker wmc48 (McCartney et al. 2005b). Our complete map (supplemental Figure S4 at http://www.genetics.org/supplemental/) shows wmc48 to be located 13.4 cM from wPt2788 (Figure 2d) and significantly associated with GY.
Clusters of QTL for GY and yield components mapped to 1D, 2A, 6B, and 7D in Li et al. (2007). DArT marker wPt3738 on 1D in our map is significantly associated with GY; however, this region in our map is closer to SSR gwm642 than was reported by Li et al. (2007). Li's cluster of QTL at 2A may coincide with wPt7901, wPt6687, and wPt9793 at 2AS (Figure 2c), and Li's QTL at 6B is near gwm132, which in Figure 2c is ∼20 cM from several significant DArT markers for GY. Finally, the QTL at 7D seems to be distally associated with wPt1269 at 7DS in our map; nevertheless, this may be due to the relatively small number of DArT markers mapping to this chromosome.
Lines carrying the 1B.1R translocation show increased GY (Carver and Rayburn 1994), which may explain the significant associations found between GY and DArT markers mapping to the long arm of chromosome 1B (Figure 2a). The 7DL.7Ag translocation segment, associated with GY increase under irrigated conditions and GY decrease in unfavorable environments (Braun et al. 1996; Singh et al. 1998), was not detected in this study. CIMMYT wheat lines are selected for consistently high performance under many diverse growing conditions and so may not have positive selection either for or against this translocation.
The short-stature wheats containing dwarfing Rht genes have been associated with high GY (Trethowan et al. 2007). Our study shows significant associations with GY for known chromosomal regions bearing the dwarfing alleles Rht-B1 and Rht8 genes (Ortiz et al. 2007b). (Table 6). While all CIMMYT semidwarf bread wheats are expected to carry Rht-B1b or Rht-D1b, it was shown that a height-promoting allele at the Rht8 locus is also frequently present in CIMMYT wheats, and it is speculated that this allele might confer an adaptive advantage by partially counteracting the dramatic effects of Rht-B1b and Rht-D1b (Worland et al. 2001).
The importance for GY of earliness per se (Eps), photoperiod insensitivity (Ppd), and vernalization (Vrn) genes was indicated by our study. The anthesis rate in wheat also depends on the variation in Eps, Ppd, and Vrn genes (Van Beem et al. 2005). Vrn genes are common in CIMMYT bread wheat germplasm, whereas Eps genes occur only at low frequencies. Chromosomal regions bearing these genes were significantly associated with QTL for GY, as revealed by DArT markers in both ESWYT sets included in this study (Table 6).
Association between all traits:
Markers for many of the traits studied here were found to be colocalized in the same genomic regions, as well as often significantly associated. Multiple regression of GY on the disease traits shows that only variability for LR and PM significantly contributes to GY variability, with a coefficient of determination of 22%. Significant (P ≤ 0.05) rank correlations (rS) were found for the following pairs of traits across the ESWYTs: GY and YR (rS = −0.23), SR and YR (rS = 0.45), LR and YR (rS = 0.40), LR and PM (rS = 0.37), and YR and PM (rS = 0.34). These results indicate that the YR-resistant wheat lines included in this study were often resistant to SR, LR, and PM as well. In addition, the lines that showed the highest susceptibility to YR also displayed lower GY due to a significant negative correlation. In ESWYTs 1 and 6, significant associations were found between SR and YR (rS = 0.60 and 0.55, respectively). In ESWYT 6, associations were also found between SR and PM (rS = 0.40) and YR and PM (rS = 0.46). In ESWYT 10, a significant rank correlation was found between SR and YR (rS = 0.46). In ESWYT 20, significant associations were found between GY and YR (rS = −0.40), GY and PM (rS = −0.43), SR and LR (rS = 0.34), and LR and YR (rS = −0.47). In ESWYT 24, significant associations were found between LR and PM (r = 0.35) and LR and YR (rS = 0.35) (data not shown).
Several DArT markers were associated with more than one trait investigated in our study. Several resistance genes (R genes) affecting different diseases have been reported to be clustered in specific regions of the genome, such as Lr20-Sr15-Pm1 (Neu and Keller 2002). These clusters of R genes are important breeding targets, because the transfer of such a region results in resistance to several pathogens simultaneously (Neu and Keller 2002). Pleiotropy or linkage between adult plant resistance genes for different pathogens has also been reported. For example, Lr34 is completely linked or pleiotropic to Yr18, which is also linked to a gene conditioning tolerance to barley yellow dwarf virus (Bydv1) (Singh 1993) and to PM resistance (Spielmeyer et al. 2005; Lillemo et al. 2007). In this study, associations between disease resistance and GY were probably caused by the reduction in yield loss due to susceptibility, since only variability for LR and PM significantly contributed to GY variability, explaining 22% of GY variation. This result indicates that ESWYT materials maintained a relatively high level of overall disease resistance, with abiotic factors presumably accounting for most of the variation.
Implicit marker effects in statistical models:
The fixed effects of markers are implicitly considered when comparing the BLUPs of the lines obtained from LMM1 by using the variance–covariance matrix of the BLUPs. Modeling the marker effects in this manner has two advantages over the model that explicitly includes markers as fixed effects: (1) information contained in relationship matrix A is fully exploited, as its effect is reflected in the BLUPs of the lines (and in their variance–covariance matrix) and, thus, in the contrast for each marker in each subpopulation and across subpopulations; and (2) LMM1 can be fitted to avoid nonidentification and overparameterization. This last issue is more of a problem when some lines lack data for some markers. Another point worth mentioning is that since markers represent attributes of lines, an appropriate model would include lines nested within markers (Malosetti et al. 2007) and within subpopulations. When markers are explicitly included in LMM1 and LMM2 as fixed (or random) effects, it is not clear how the effect of relationship matrix A (or kinship matrix K, in case markers are being used) would be reflected in the contrasts of the BLUPs of the lines in each subpopulation. Furthermore, aliasing problems due to ambiguities in the model are often encountered when including (1) markers as fixed (or random) effects, (2) the use of relationship matrix A for modeling GE, and (3) information from the Q matrix (as fixed or random covariables). Marker effects and Q matrix values are related; therefore, it would be advisable to calculate submatrices A for each marker and each subpopulation in matrix Q.
A statistical model that explicitly includes subpopulations, markers, and lines must consider (1) the main effect of markers and the interaction of marker × environment, (2) the main effect of subpopulations and its interaction with environments, and (3) the main effects of lines nested within subpopulations and markers and their interaction with environments. Interaction effects can be estimated separately or combined with principal effects (i.e., markers could be estimated separately or combined with marker × environment interaction). Nevertheless, this implies that for each marker type only the COP (or kinship K) of the lines included at each level of the marker must be considered, and lines with no markers must not be included in the model. Without clear separation and definition of these terms, ambiguity in the linear mixed model will lead to aliasing in the estimation process. Model LMM1 draws on an independent source of information for measuring the covariance among relatives (from matrix A) and uses information on population structure (from matrix Q) only as weights in the contrast of the BLUPs of the lines for each marker. When information from Q is included in the model as fixed covariates, then marker effects are estimated (or predicted) at a certain value of the covariates; however, it is not clear what that value is.
The models used in this study include the relationship matrix and the population structure of each ESWYT and of the two ESWYT sets. Yu et al. (2006) found that incorporating the outcome of population structure contained in the Q matrix increases the power to detect true marker–trait association. In addition, Piepho (2005) shows that genetic correlations (for QTL and QTL × environment interaction) in multienvironment trials lead to a substantial increase of the type I error rate when testing for QTL effects. On the basis of these findings, we can conclude that, by using LMM1 and LMM2, the risk of finding false marker–trait associations was reduced. The application of appropriate statistical models for assessing GE is useful in plant breeding (Sorrells 2007) and should be adopted in association analysis studies. Statistical models other than LMM1 and LMM2 should be further studied.
Results from this study show DArT markers significantly associated with the measured traits in chromosomal regions where genes or QTL have been previously reported, as well as significantly associated DArT markers in regions where neither genes nor QTL have previously been reported for these traits. The role of these regions will need to be further investigated. The consensus map used in this study has proven to be sufficiently precise to compare the locations of DArT markers from this study with those of other markers (e.g., SSRs) previously reported. Several of the known cataloged genes, such as Lr47, were recently transferred from alien or related species and thus are not expected to be present in the material included in this study. However, it cannot be excluded that noncataloged corresponding homologs of these alien loci could be present in the wheat lines investigated in this study, and that they could have some effect in reducing disease severity. For example, the markers on 7AS could be tagging a homolog of the Lr47 locus rather than a completely new LR resistance gene. It is also likely that many of these gene-rich regions will carry previously unidentified minor slow rusting or adult plant resistance genes.
The historical phenotypic data used in this study were collected from many different environments worldwide, which led to GE effects being observed for all traits. Partitioning the total genetic effect into additive and additive × additive effects used in association analysis could be an added advantage for identifying significant markers associated with phenotypic traits with these effects, which will facilitate further genetic enhancement of the crop.
Concerning the disease traits, the variation of the pathogen races occurring at different locations is likely to reduce the identification of race-specific resistance. Most of the known cataloged genes are race specific and effective only in some geographic areas. In this study, these pathogens may not have been present at high frequencies in the years of evaluation at many locations. For example, Lr3a, which occurs in several CIMMYT wheat lines, could not be identified in this study (at least in ESWYT set 2) because virulence to this gene is common worldwide. Virulence to Lr1 is also common in most wheat-growing areas, and the gene would have been difficult to detect even if more markers had mapped to the chromosome containing Lr1. Loss of effectiveness due to the presence of virulent races is probably the reason why we could not detect the chromosomal regions of some genes known to be present in CIMMYT wheat materials. Only a few genes, such as Lr34/Yr18, Lr46/Yr29, and Yr30/Sr2, are nonrace specific in nature and should have small-to-intermediate effects across different environments. Analyses did, in fact, identify chromosomal regions carrying the above genes. Finally, differences and similarities between ESWYT sets 1 and 2 track genetic changes in the CIMMYT wheat breeding program over the past 24 years, and further analyses will shed greater light on the genetic determinants of GY and disease resistance in an evolving context.
The authors thank the numerous cooperators in national agricultural research institutes who carried out the elite spring wheat yield trials and provided the phenotypic data analyzed in this article plus the International Maize and Wheat Improvement Center's international nursery and seed distribution units who prepared and distributed the seed and computerized the data. We thank Alma McNab for the English editing.
Communicating editor: J. B. Walsh
- Received July 18, 2007.
- Accepted September 9, 2007.
- Copyright © 2007 by the Genetics Society of America