Abstract
Height is one of the most heritable and easily measured traits in maize (Zea mays L.). Given a pedigree or estimates of the genomic identity-by-state among related plants, height is also accurately predictable. But, mapping alleles explaining natural variation in maize height remains a formidable challenge. To address this challenge, we measured the plant height, ear height, flowering time, and node counts of plants grown in >64,500 plots across 13 environments. These plots contained >7300 inbreds representing most publically available maize inbreds in the United States and families of the maize Nested Association Mapping (NAM) panel. Joint-linkage mapping of quantitative trait loci (QTL), fine mapping in near isogenic lines (NILs), genome-wide association studies (GWAS), and genomic best linear unbiased prediction (GBLUP) were performed. The heritability of maize height was estimated to be >90%. Mapping NAM family-nested QTL revealed the largest explained 2.1 ± 0.9% of height variation. The effects of two tropical alleles at this QTL were independently validated by fine mapping in NIL families. Several significant associations found by GWAS colocalized with established height loci, including brassinosteroid-deficient dwarf1, dwarf plant1, and semi-dwarf2. GBLUP explained >80% of height variation in the panels and outperformed bootstrap aggregation of family-nested QTL models in evaluations of prediction accuracy. These results revealed maize height was under strong genetic control and had a highly polygenic genetic architecture. They also showed that multiple models of genetic architecture differing in polygenicity and effect sizes can plausibly explain a population’s variation in maize height, but they may vary in predictive efficacy.
HEIGHT adaptations are essential to plant fitness and agricultural performance. They are intrinsic to the evolutionary history, standing diversity, and genetic architecture of a population, and impact the velocity of its evolution and response to breeders’ selection pressures. The height of plants evolving in competitive environments is in part a product of selection imposed on fitness by the effects of light interception, carbon and nutrient capture, weed competition, and seed dispersal (Lin et al. 1995). In domesticated crops, breeding efforts facilitating agricultural industrialization indirectly select height adaptations, maximizing yield under monoculture. Height adaptations buffering yield against drought and other environmental factors are also desirably selected. This is illustrated in the yield gains and height reductions of wheat (Triticum spp.) and rice (Oryza sativa) in the Green Revolution (Khush 2001). During selection for industrial agriculture, height adaptations increase harvest uniformity, favorably partition carbon and nutrients between grain and nongrain biomass, and enhance fertilizer, pesticide, and water use efficiency (Khush 2001). In grasses like maize, wheat, and rice, apical growth is terminated at reproductive maturity (Lin et al. 1995). This may establish genetic correlations among height and flowering and constrain evolvability.
Due to its high heritability and the ease of its measurement, plant height has been studied since Mendel’s foundational hybridization experiments (Mendel 1866). Plant height loci have been cloned and resolved by molecular tagging of large-effect alleles often induced by mutagenesis (Salas Fernandez et al. 2009; Andorf et al. 2010). Over 40 maize genes at which mutations have large effects on plant height have been identified (Multani et al. 2003). These are involved in hormone synthesis, transport, and signaling (Wang and Li 2008). Well-characterized maize height genes include: brachytic2, influencing polar auxin transport (Multani et al. 2003); dwarf3, mediating gibberellin synthesis (Winkler and Helentjaris 1995); dwarf8 and dwarf9, regulating DELLA proteins of gibberellin signal transduction pathways (Lawit et al. 2010); and nana plant1, impacting brassinosteroid synthesis (Hartwig et al. 2011).
The large effects of these loci suggest they may not commonly segregate in natural populations due to rapid loss or fixation. Loss of large-effect alleles is especially likely as a population approaches optimal fitness or agricultural desirability, as predicted by the Fisher–Orr model (Orr 2005; Brown et al. 2011). Nevertheless, height mutations may influence other traits, causing antagonistic pleiotropy and resulting in the maintenance of such alleles. Polymorphism at these loci may also be maintained in some genomic regions if selection is limited by linkage disequilibrium as explained by Hill–Robertson interference (Hill and Robertson 1966) and the Fisher–Muller model (Fisher 1930; Muller 1932). In maize, breeders maintain inbred heterotic groups with established general combining abilities but select individuals based upon specific hybrid performance. This practice may allow segregation of large recessive effects. Admixture of exotic germplasm and locally adapted subpopulations may also allow persistent segregation of large-effect alleles due to evolutionary capacitance (Masel 2005), wherein environments and epistatic alleles buffer large effects in adapted populations but cease to do so in a new environment or when new alleles become common in an admixed population. Therefore, the proportion of variation explained by large effects in admixed diversity panels is unknown. Details of the joint distribution of genetic effect sizes, frequencies, linkage equilibria, recombination rates, environmental and genetic conditionalities (such as genotype-by-environment interaction and epistasis), and pleiotropy will enhance our understanding of phenotypic landscapes or response surfaces implicit to crop evolution (Rice 2004; Messina et al. 2011) and help to optimize selection practices for crop improvement.
Maize is a model species with exceptional morphological and molecular diversity. Millions of single nucleotide polymorphisms (SNPs) may segregate in a modestly sized population (Tian et al. 2011) and the annual cross-pollination of maize rapidly decays their linkage disequilibrium (LD) (Remington et al. 2001). Breeders have selected and recombined this diversity for >7000 years and commercial hybrid breeding has ensued for a century (Hamblin et al. 2007; Piperno et al. 2009; Wallace et al. 2013). The past decade has seen exceptional genotyping advances. Molecular markers now regularly serve as proxies for heritable trait variation and aid dissection of its segregation. Over 27 million variants have been genotyped in 27 inbreds (Gore et al. 2009; Chia et al. 2012). These were parents of the maize Nested Association Mapping (NAM) families (Chia et al. 2012) and the intermated B73 × Mo17 family (IBM) (Lee et al. 2002). Parental variants were imputed based on 1106 markers scored on 4892 recombinant inbred lines (RILs). Also, 2815 inbreds from the US Department of Agriculture–Agricultural Research Service (USDA–ARS) North Central Regional Plant Introduction Station (NCRPIS) were genotyped-by-sequencing to identify 681,257 SNPs (Romay et al. 2013). These are the most densely genotyped public maize populations (Chia et al. 2012).
This study’s objectives were to explore the genetic architecture of maize height and related traits, validate several allele estimates of the inferred architecture, and contrast multiple models estimating genetic architecture in their ability to explain the natural height variation of observed lines and predict the height of unobserved lines given their genotypes. To achieve these objectives, maize plant height (PHT), ear height (EHT), flowering time (days to anthesis, DTA), and node count (NPH) were measured in the NAM and IBM families and joint-linkage mapping of quantitative trait loci (QTL) was performed. A genome-wide association study (GWAS) was also executed across the NAM and IBM families as well as within the NCRPIS diversity panel. To validate allele estimates of a QTL on chromosome 9, PHT was mapped in two families of near isogenic lines (NILs) with introgressions of NAM parents (CML277 and CML333) in a B73 background. Variants segregating in the NAM and IBM families as well as the NCRPIS panel were used to construct genomic identity-by-state (IBS) relationship matrices for genomic best linear unbiased prediction (GBLUP) (VanRaden 2008). GBLUP was performed and compared to QTL models explaining PHT, EHT, DTA, and NPH variation across, within, and between RIL families. GBLUP was also performed within the NCRPIS diversity panel.
Materials and Methods
Plant materials and environments
The NAM families were developed by the Maize Diversity Project as previously described (McMullen et al. 2009; Hung et al. 2012). The IBM family (Lee et al. 2002) was also evaluated. RILs were measured in 10 environments, under conventional fertilization, weed, and pest control. In 2006 and 2007, RILs were scored in Aurora, New York; Clayton, North Carolina; Columbia, Missouri; and Urbana, Illinois. In 2008, RILs were scored in Aurora, NY. In 2009, they were scored in Columbia, MO. In total, 4892 NAM and IBM RILs were scored for height in >7 environments. This dataset was used in linkage mapping, GWAS, and GBLUP.
In New York, RILs were planted at Muskgrave Research Station (Honeoye silt-loam soil). In 2006, 2007, and 2008, single row plots of 12 plants were grown for each RIL. In 2006, RILs were stratified by randomized family as previously described (Hung et al. 2012). In each family, incomplete blocks of 20 random RILs, B73, and the family’s alternate parent were grown in an α-lattice. Plot order in each block was randomized. In 2007 and 2008, a similar approach was taken, but plots were stratified by maturity group. In North Carolina during 2006 and 2007, RILs were planted at the Central Crops Research Station (Cecil sandy-loam soil) in single row plots of 8 plants after thinning. The RILs were grown in a similar design to New York the same years. In 2006, a second replicate of families corresponding to CML247, CML277, Ki3, M162W, Mo17, and Tzi8 was grown. In Missouri, RILs were planted at Bradford Research Center (Mexico silt-loam soil) in 2006 and 2007. A single row plot of 9 plants was grown for each RIL. Due to poor germination and drought, families corresponding to CML247, CML322, IL14H, M162W, Mo18W, MS71, NC350, NC358, and P39 were not scored in 2006, and in 2007, the IBM family was not grown (Hung et al. 2012). In Missouri during 2009, RILs were planted in single-row plots of 15 plants at Rollins Bottoms Research Station (Haymond silt-loam soil). A similar field design to New York and North Carolina was used. In Illinois during 2006 and 2007, RILs were grown at the Crop Sciences Research and Education Center (Muscatine silt-loam soil) in single row plots thinned to 15 plants. Plots were arranged in a design similar to other environments planted the same years.
After joint-linkage mapping, two NILs with a B73 background were used to validate the family-nested QTL explaining the most variation in NAM. The NILs had introgressions for two alleles on the long arm of chromosome 9 and were obtained from the Syngenta Corporation. Introgressions of CML277 and CML333 were estimated to be ∼90 Mb (Chr 9: 25,654,718–114,922,534 bp, Maize RefGenV1) and 125 Mb (Chr 9: 11,972,467–136,597,740 bp, RefGenV1) based on 1106 molecular markers genotyped on the NILs (see genotyping). These NILs were backcrossed to B73 and their progeny were selfed to produce two segregating NIL families of ∼3500 plants. After genotyping, 88 recombinants derived from the CML277 introgressed NIL and 93 recombinants derived from the CML333 introgressed NIL were recovered. These were selfed and fixed for recombinants. Bulks of the lines were planted in single-row plots and scored for PHT, EHT, and DTA in Aurora, NY; Columbia, MO; and Clayton, NC in 2011. Plots consisted of 12, 12, and 8 plants per row after thinning in New York, Missouri, and North Carolina, respectively. Single-row plots of 12 and 8 plants per row were scored again in Columbia, MO and Clayton, NC in 2012. In all environments, plots were grown in incomplete blocks of 15 fixed recombinants, B73, and the NIL family’s alternate parent, CML277 or CML333.
In addition to RILs and NILs, the NCRPIS diversity panel from the USDA–ARS North Central Regional Plant Introduction Station, was evaluated in 2010. This resource contains 2815 inbreds from worldwide maize gene pools (Romay et al. 2013). Inbreds were planted in single-row plots and evaluated for PHT, EHT, and DTA in three environments: Aurora, NY; Clayton, NC; and Columbia, MO. In New York and North Carolina, inbreds were planted in plots of 12 plants at Muskgrave Research Station, and Central Crops Research Station, respectively. In Missouri, inbreds were planted at South Farm in plots of 15 plants. In all fields, the NCRPIS panel was stratified by nine maturity groups. Lines were randomly assigned to two incomplete blocks of 19 inbreds, checks of B73, IL14H, KI11, P39, SA24, or TX303 were organized in an α-lattice.
Phenotyping maize height and related traits
In all environments and panels, PHT was measured as the distance in centimeters from the soil line of the plant to the base of the flag leaf at reproductive maturity; this measure excluded any variation in tassel length from the flag leaf to the top of the plant. Similarly, EHT was scored as the distance from the soil to the primary ear node, at the same developmental stage. NPH was scored as the number of nodes between the top brace root node and the flag leaf, excluding any variation in brace root nodes and any subterranean nodes. In all three traits, multiple plants (three to eight) were measured in each plot and mean plot values were recorded. As described (Buckler et al. 2009), DTA was scored as the days from planting to median anthesis in a plot.
Genotyping RIL and NIL families and the NCRPIS diversity panel
Six molecular marker sets were used for joint-linkage QTL mapping, GWAS, and GBLUP across NAM and IBM families, positional mapping across two NIL families, and linear mixed model GWAS and GBLUP across the NCRPIS inbred diversity panel. In the first marker set, 1106 markers were genotyped on an Illumina Golden Gate assay across the NAM and IBM families to facilitate joint-linkage mapping (McMullen et al. 2009). In this marker set, missing genotype calls were imputed as a weighted average of flanking markers. Relative weights were estimated from the genetic distance between a missing marker and adjacent markers as previously described (Tian et al. 2011). These 1106 markers were also genotyped on an Illumina Golden Gate assay across the NILs provided by Syngenta to assess introgression sizes in the B73 background.
A second marker set of ∼1.6 million of the 3.3 million SNPs detailed in the maize HapMapV1 (Gore et al. 2009) and scored in the 27 founder lines of the NAM and IBM families were also employed in the analyses. Within this marker set, missing genotypes were imputed across founder lines using the haplotype clustering algorithm fastPHASEv1.3 as previously described (Tian et al. 2011). Next, SNPs were projected onto the 4692 RIL progeny of the NAM and IBM families by first estimating their physical distance to the nearest flanking markers of the first marker set of 1106 markers based on maize RefGenV1. These physical distances were then used to calculate a weighted average of genotype scores from the flanking markers of the first marker set as an estimate of the projected SNP’s genotypes across all RILs (Tian et al. 2011). SNP genotype scores were subsequently used in GBLUP of the RIL families.
A third marker set of ∼26 million of the 55 million SNPs and RDVs reported in the maize HapMapV2 (Chia et al. 2012) was imputed across NAM and IBM family parents using regions of identity by descent. These were projected on RIL families based on their parental lineage and the B73 genome as previously reported (Tian et al. 2011; Chia et al. 2012). As in maize HapMap V1, SNPs projected on the RILs were assigned a value equal to the weighted average of flanking markers from the first marker set of 1,106 markers genotyped across all RILs. Weights were estimated by a SNP or read-depth variant′s (RDV) physical distance from flanking genotyped markers based on maize RefGenV1 (Tian et al. 2011; Chia et al. 2012). This marker set was then employed in joint-linkage-assisted GWAS of the RIL families.
A fourth marker set consisting of four SNPs was genotyped using KBioscience’s KASPar SNP genotyping system (http://www.kbioscience.co.uk/). The assayed SNPs were located at 73,592,864 bp, 99,948,772 bp, 102,469,299 bp, and 109,910,100 bp on the long arm of chromosome 9 in maize RefGenV1. These SNPs were scored across NILs with CML277 and CML333 introgressions in chromosome 9 obtained from Syngenta. No imputations or projections of SNPs were performed. The SNPs were used to screen 3488 and 3897 lines for recombinant haplotypes derived from the CML277 and CML333 NIL families, respectively.
A fifth set of 1813 SNPs located across a 126-Mb region on maize chromosome 9 was scored using genotyping by sequencing (GBS) (Elshire et al. 2011). These SNPs were genotyped across 86 and 93 fixed recombinants of the CML277 and CML333 NIL families, respectively. Missing SNPs were imputed using a nearest neighbors algorithm in TASSEL v.3.0 (Bradbury et al. 2007) and estimated with haplotypes of adjacent SNPs in a 1024-bp window (Romay et al. 2013). Genotype calls across these SNPs were used in mapping both NIL families.
A final set of 681,257 SNPs scored on the NCRPIS diversity panel was genotyped using GBS as previously described (Elshire et al. 2011; Romay et al. 2013). Missing SNPs were also imputed by a nearest neighbors algorithm in TASSEL v.3.0 (Bradbury et al. 2007) and estimated with haplotypes constructed from SNPs in a window of 1024 bp (Romay et al. 2013). The SNPs were then used in linear mixed model GWAS of the NCRPIS diversity panel and genomic prediction by GBLUP (Endelman 2011). All marker sets genotyped in the NAM and IBM families or the NCRPIS diversity panel, and used within the analyses of this study, are publicly available at www.panzea.org.
Partitioning trait variance, calculating heritability, and estimating line values
To partition trait variation into components of genetic and environmental variance, we performed linear mixed modeling using ASReml v3.0 (Gilmour et al. 1995). To reduce undue influence of outliers, plot measures over three standard deviations from each environmental mean were replaced by the mean plus or minus three standard deviations, whichever value best approximated the original measure. Linear mixed model selection was then performed for each trait with custom Java code calling ASReml v3.0 (Gilmour et al. 1995). First, trait variation within each of the environments was fit in separate models. Terms retained in these models were chosen by backward selection based upon likelihood ratio testing (P < 0.05). These terms were then nested within environment and a multienvironment model was constructed.
For the RILs of the NAM and IBM families, linear mixed models separately fit for each environment included a fixed effect for the grand population mean and multiple random effects. Random effects entering the full model included family and RIL nested within family genotypic effects, as well as environmental effects denoting the blocks, rows, and columns of each field design. A separate variance component was fit for RIL nested within each of the NAM and IBM families and an additional family term was constructed for replicated parental checks. All random effects in the model, including the genotypic effects of family and RIL nested in family, were modeled with independent G covariance structures. But, a correlated R structure was fit among residuals based on a two-dimensional separable first-order autoregressive spatial structure for rows and columns within each environment. The importance of all environmental effects and the residual correlation structures within each environment’s model were tested by backward selection and retained if they met a likelihood ratio significance of P < 0.05.
Next, for each trait, a single linear mixed model across environments was fit for the RIL families, including and nesting the significant components of each individual environment model by environment. Multiple across environment terms were also added, including environment, family by environment, and RIL nested within family-by-environment interaction terms, as well as a heterogeneous spatially correlated R structure across environments. The significant autoregressive spatial structures for rows and columns within each environment were also included in the model. Line values for each RIL were predicted from the multienvironment model for use in mapping and prediction (Table S1). Genetic effects for family and RIL nested within family were refit as fixed effects within the same multienvironment model to infer best linear unbiased estimates of line values and their unbiased distributions. Using these line values, significance of skew about the mid-parent values for each RIL family was estimated by a modified two-sided D’Agostino–Pearson test (D’Agostino 1970). This test was modified such that estimates of the population’s mean were replaced by the population’s mid-parent value and significance was determined by the Bonferroni-corrected significance threshold of P < 1.9E-3. This enabled the assessment of skew with respect to the mid-parent and was performed as a test for the potential of nonadditive genetic variation with the assumption that the population was random mating and not under selection since the initial parental cross.
Recombinant NILs fixed for an introgressed region of CML277 or CML333 on the long arm of chromosome 9 were fit in a linear mixed model across the five field environments surveyed for PHT and EHT and across the three field environments measured for DTA. Fixed effects were fit for the grand mean as well as NIL family and line value was nested within the NIL family. Random effect terms denoting environment and block nested within environment were also included in the model. Best linear unbiased estimated line values for each NIL were then calculated from the multienvironment model (Table S3).
Within the NCRPIS diversity panel, a single linear mixed model was fit across all environments including a single fixed effect for the grand mean and random effects for genotypic and environmental factors using ASReml v3.0 (Gilmour et al. 1995) for each trait. Random effects entering the full model included an inbred genotypic effect and the environmental effects of field, row, column, and blocks in each environment. Analogous to the RIL families, all random effects including genetic effects were modeled with independent G structures. A two-dimensional separable first-order autoregressive spatial structure for rows and columns in each environment and a heterogeneous correlated covariance structure across environments were modeled in the R structure of the residuals. From the multienvironment models, best linear unbiased predicted and estimated line values for each inbred of the NCRPIS diversity panel were calculated for mapping, prediction, and correlation (Table S2).
After constructing the multienvironment models, we estimated heritability on a line-values basis for every trait in each of the NAM and IBM families and across the NAM and NCRPIS diversity panel, as previously described (Holland et al. 2003; Hung et al. 2012). The variance component explaining variation between families and the arithmetic mean of variance components explaining variation between lines in each family were summed to infer the genetic variance of each trait. Genetic variances in each family were estimated as the variance component explaining variation between lines of the family. This was also true across the NCRPIS diversity panel, where genetic variance was estimated as the variance component explaining variation between lines. To attain heritability estimates, the values were divided by the total variance between and within line values after accounting for variance attributable to established environmental sources. For estimation of contributions to a single line value, family-by-environment and line nested in family-by-environment variance components were divided by the harmonic mean of the number of environments a family or line was measured as previously described (Holland et al. 2003). Contributions of residuals were estimated by dividing the arithmetic mean for residuals across the heterogeneously modeled residuals for each environment by the harmonic mean number of plots a line was measured (Holland et al. 2003). Heritability estimators calculated within a NAM family as well as across the NCRPIS diversity panel did not involve family or family-by-environment variance components, but were calculated similarly.
Bootstrapped QTL mapping across and within RIL families
Joint linkage mapping of family-nested QTL explaining variation in line values for each of the traits across all environments was performed using the SAS system version 9.2 (SAS 2004\x{2013}2008). PROC GLMSELECT was implemented to select a family-nested QTL model from the first marker set of 1106 SNPs nested in each of the 26 NAM and IBM families. Based on null permutation testing, the threshold for model inclusion and exit of each family-nested QTL in all traits was set to a false positive rate of 0.001 during stepwise model selection. A term denoting RIL family was also forced into every model. In addition to joint-linkage mapping of RIL family-nested QTL using the full dataset (Buckler et al. 2009; Tian et al. 2011), a family-stratified bootstrap sampling scheme was also executed. In this approach, 80% of the RILs within each of the NAM and IBM families were randomly sampled with replacement (20% of RILs were present across all samples) and the stepwise model selection routine was performed for 100 sampling iterations to permit calculation of resample model inclusion probabilities (RMIPs) or the probability a family-nested QTL is included within the final model (Valdar et al. 2009). To estimate a genome-wide significance threshold for RMIPs, null permutation testing was performed for PHT and EHT. At the previously defined thresholds for model inclusion and exit, no family-nested QTL were found to exceed RMIPs of 0.10 upon null permutation of trait values under the employed bootstrap sampling scheme. Linkage mapping of nonnested QTL explaining PHT variation in the line values of each family was performed in an analogous manner to joint-linkage mapping to permit calculation of QTL RMIPs. SAS system code for the sampling procedure is available on request.
Fine mapping two alleles of a height QTL on chromosome 9 in NIL families
To begin validation of the family-nested QTL results, we measured and further resolved two NILs possessing introgressions of the tropical lines CML277 and CML333 on the long arm of chromosome 9 in a B73 (temperate stiff stalk) genetic background. In a preliminary analysis, Welch’s t-tests from the base package of R v2.12.0 (R Development Core Team 2011) were employed to compare the NILs obtained from Syngenta to the inbred B73. These analyses concurred with the significant PHT increasing (relative to B73) QTL effects identified in the CML277 × B73 and CML333 × B73 NAM families within the NIL’s introgression regions. After backcrossing the NILs to B73, selfing the resulting progeny, screening 3488 and 3897 F2 plants for recombinants within the introgression regions, and selfing two generations with selection for F3 plants homozygous for recombinant chromosomes, bulks of the homozygous recombinant lines were evaluated for PHT, DTA, and EHT. Genotyping by sequencing (Elshire et al. 2011) and imputation were performed across NIL families of 86 and 93 recombinants with introgressions for CML277 and CML333 as previously described. Using Welch’s t-test in R v2.12.0, the NIL line values for each of the traits were used in 1813 sequential single marker regressions across the introgression region to further resolve the original NIL effect and support joint-linkage mapping results within the CML277 and CML333 NIL families.
GWAS across RIL families and within the NCRPIS diversity panel
To further resolve the genetic architecture of PHT and related traits, we conducted a joint-linkage-assisted GWAS with 4892 RILs across 26 million SNPs imputed from maize HapMapv2 (Chia et al. 2012). After removing all family-nested QTL from a single chromosome, the family nested QTL model was fit to line values for each trait and residual variance attributed to the removed family-nested QTL as well as genetic variance not previously accounted for by the QTL model was determined (Tian et al. 2011). This procedure was repeated for all 10 chromosomes of the maize genome. Using these estimates of residual genetic effects and custom Java code we performed a stepwise regression procedure as previously described (Tian et al. 2011) with a genome-wide false positive threshold of 5e-4 as determined by 2000 null permutations of PHT. The procedure was repeated for the residuals of each chromosome for 100 stratified sampling iterations whereby 80% of the RILs within each NAM and IBM family were randomly sampled with replacement. The RMIP of a SNP was estimated as the proportion of models containing the SNP (one model for each stratified sampling iteration) out of the 100 models that were constructed. This measure was used to assess the robustness of the SNP's association with the trait. A genome-wide significance threshold for RMIP values (0.05) was determined by null permutation testing as previously described (Brown et al. 2011; Kump et al. 2011; Tian et al. 2011). Proximity of GWAS associations to QTL identified during joint-linkage QTL mapping and candidate genes were inferred based on the RefGenV1 physical map and functional annotations from maize genome sequence release 4a.53.
Using the genome association and prediction integrated tool (GAPIT) (Lipka et al. 2011) in R v2.12.0 (R Development Core Team 2011) we performed sequential single marker mixed-model GWAS for PHT and related traits across the NCRPIS diversity panel. This approach included a relatedness matrix scaled to be analogous to the numerator relationship matrix as described by the first method of VanRaden (Yu et al. 2006; VanRaden 2008) as well as fixed covariates for six eigenvectors with the largest eigenvalues of the relatedness matrix (these eigenvectors explained 31.6% of the variance in additive genetic relationship matrix) to capture additional population structure and allow independent scaling of the matrix. It permitted identification of significant associations with PHT and EHT. Associations with Benjamini–Hochberg false discovery rate (FDR) <5% (Benjamini and Hochberg 1995) were considered significant. The proximity of these GWAS associations to candidate genes was inferred based on the RefGenV2 physical map and the functional annotations from maize genome sequence release 5a.
Cross-validated genomic prediction across RIL families and within the NCRPIS panel
Using the package rrBLUP in R v2.12.0 (Elshire et al. 2011; Endelman 2011), we conducted genomic prediction by GBLUP. Identity-by-state (IBS) matrices detailing genomic relationships were constructed as described by the first method detailed in VanRaden (2008) for the NAM and IBM families from 1.6 million SNPs of the maize HapMapV1 (Gore et al. 2009). Estimates of genomic relationship in the NCRPIS diversity panel were also calculated using the 681,257 SNPs genotyped in the panel (Bradbury et al. 2007). After constructing the genomic relationship matrices, all line values in an experiment (NAM RILs or the NCRPIS diversity panel, each analyzed separately) were fit by restricted maximum likelihood in a GBLUP framework to assess variance explained by the method. Next, RILs and inbreds in each panel were randomly allocated to five disjoint subsets for cross-validation (Figure S7). Line values from combinations of one to four subsets were used in model calibration to predict line values of remaining subsets. Prediction accuracies were averaged across the folds. This process was repeated 10 times selecting five random subsets each time to estimate prediction accuracy with respect to the number of lines included in model calibration.
For comparison to GBLUP of RILs, prediction of NAM RIL PHT line values by bootstrap aggregating or bagging (Breiman 1996) family-nested QTL models was performed using PROC GLMSelect in the SAS system v9.2 (SAS 2004–2008). However, unlike the approach to map family-nested QTL, only RILs in a calibration subset were bootstrap sampled and used to construct family-nested QTL models to predict the RILs not included in the calibration subset. Selection of calibration subsets was performed in an identical sampling scheme to sampling families during GBLUP cross-validation (Figure S7). But, from every subset of RILs selected to calibrate a prediction model, 10 bootstrap samples were taken with replacement in a family-stratified manner, these samples were not disjoint. From these bootstrap samples of the calibration subset (and equal in size to the calibration subset), 10 family-nested QTL models were constructed. A family term was included in every model and family-nested markers were included until reaching the null-permutation threshold (as in the bootstrapping approach employed in the full dataset). Models from bootstrap samples of the calibration subset were each used to predict RILs excluded from the calibration subset and an average of the predicted values (bagged estimator) was calculated for each RIL. While prediction accuracy is traditionally measured as the correlation (r) between predicted and observed line values given this statistics connection with selection accuracy, we measured the coefficient of determination (r2) obtained by regressing line values against predicted line values obtained by GBLUP or by bagging family-nested QTL models during cross-validation. This measure has utility and reveals the variation explained among the predicted values. But, given that this measure was employed, it must be noted that in all instances correlations between predicted and observed line values were positive.
Cross-validated genomic prediction within and between RIL families
Within each RIL family, GBLUP and bagging of QTL models was performed in a manner identical to that performed across families. The variance explained by GBLUP and predicted from varying calibration sizes was determined. Similarly, the variance explained and predicted by bagging QTL models was also estimated in a manner analogous to that performed across families. However, no family term was included within the QTL models and markers were not nested within family, as only a single family was employed in each model selection process.
Between RIL families, bagging QTL models assuming allele effects were identical between families revealed no significant prediction accuracy. This was expected given the extensive allele series observed during family-nested QTL mapping. Consequently, between-family prediction accuracy was only extensively evaluated for GBLUP. A GBLUP model calibrated by every RIL in a single family was used to predict RILs in each of the other families. This procedure was repeated for all 26 RIL families. This provided 650 estimates of between-family prediction accuracy. To assess the correlation of between-family prediction accuracies and the relatedness of their families, a modified Mantel test (Mantel 1967) was invoked between the relatedness matrix (VanRaden 2008) of the non-B73 parent of each family and the matrix of r2 estimates of between-family prediction accuracy. In the modified Mantel test, diagonal elements of both matrices were excluded from analysis as only elements of between-family prediction accuracy (off-diagonal elements) and not within-family prediction accuracy were of interest. Inclusion of these values would have upwardly biased the test’s correlation and its significance.
Results
Variation in maize height and related traits
Heritable variation was observed for PHT, EHT, DTA, and NPH in the NAM and IBM families evaluated across 10 environments. PHT, EHT, and DTA measures collected in the NCRPIS diversity panel across three environments were also highly heritable. Best linear unbiased estimated line values for PHT across environments possessed an overall standard error of difference of 5.2 and 6.4 cm in RIL families and the NCRPIS diversity panel, respectively. In RIL families, 95% of line values for PHT fell between 138 and 199 cm, with 95% of EHT line values spanning 61 to 113 cm. The shortest parent, P39 (116 cm), was shorter than nearly all RILs, whereas the tallest, TX303 (198 cm), approximated the tallest. Despite a similar range of parents and progeny across families, transgressive segregation was present in all families and significantly asymmetric about mid-parent values in 9 families for PHT, suggesting a role for epistasis (Figure 1, supporting information Table S1). Similar levels of transgressive segregation and asymmetry about mid-parent values were found for EHT and the other traits in many families (Figure S1).
Distribution of plant height (PHT) line values within and between RIL families. Asymmetric transgressive segregation about mid-parent value was observed for PHT line values in many of the NAM and IBM families. Similar trends were noted for ear height (EHT), days to pollen shed (DTA), and node counts (NPH) (Figure S1).
The only traits with significantly correlated levels of transgressive segregation (as measured by correlation of the ratio of progeny and parent variances) across families were PHT and EHT, ρ = 0.63, P < 0.01. No trends in the asymmetry of progeny distributions (as measured by skew about the mid-parent value) were found between traits across families. In the NCRPIS diversity panel, 95% of PHT line values ranged from 110 to 194 cm, while EHT line values fell between 34 and 119 cm. Despite similar ranges, the greater allelic diversity of the NCRPIS diversity panel relative to the RILs was mirrored by its larger variance for PHT and EHT.
Estimates of heritability calculated on a line mean basis display the proportion of trait variation attributable to differences between maize lines after accounting for variation explained by known environmental factors. These estimates were high for all traits (Table 1) and similar to past surveys of the NAM families in fewer environments (Hung et al. 2012). The most heritable trait across the NAM families was DTA (H2line = 0.94 ± 0.01), followed by PHT (H2line = 0.93 ± 0.01), EHT (H2line = 0.92 ± 0.01), and NPH (H2line = 0.89 ± 0.02). Traits maintained rank in the NCRPIS diversity panel: DTA (H2line = 0.92 ± 02), PHT (H2line = 0.87 ± 0.03), and EHT (H2line = 0.86 ± 0.02). Variation of heritability estimates within individual families was small; yet, a significant correlation between estimates was found for PHT and EHT across families (ρ = 0.86, P < 0.01).
Unlike recent efforts to detail the genetic architecture of flowering time photoperiod response in the NAM families (Hung et al. 2012), more uniform photoperiods were surveyed in the long day temperate environments of this study. Yet, the fraction of variation attributed to environmental factors was still significantly greater for DTA than the other traits for the RILs (Figure 2A) and the NCRPIS diversity panel (Figure 2B). Adjusting for temperature differences by converting DTA to growing degree days to anthesis did not substantially reduce estimates of between-environment variation. Genotype-by-environment interaction was greater for PHT and EHT than DTA or NPH in the RIL families and the NCRPIS diversity panel (P < 0.05).
Partitioning variation in PHT, EHT, DTA, and NPH. PHT, EHT, DTA, and NPH variation was attributed to genetic and environmental factors across the NAM families (A) and within the NCRPIS diversity panel (B).
Of the traits surveyed in RIL families, the largest correlation of line values was estimated between PHT and EHT (r = 0.77, P < 1e-6) (Figure 3A). Correlations remained strong within most NAM families (Figure S2), the IBM family (Figure 3B), and NCRPIS diversity panel (r = 0.59, P < 1e-6). However, correlation between PHT and DTA were greater in the NCRPIS diversity panel (r = 0.78, P < 1e-6, Figure 3C) than the RIL families (r = 0.34, P < 1e-6, Figure 3A).
Correlations of estimated PHT, EHT, DTA, and NPH line values. Pairwise trait correlations across all NAM families (A) and within the IBM family (B). Trait correlations within each of the NAM families were similar (Figure S2). The NCRPIS diversity panel differed, displaying higher correlations between PHT and DTA than PHT and EHT (C).
QTL mapping of maize height and related traits
To identify QTL, bootstrapped joint-linkage mapping was performed across RIL families (Table 2, Table S3). Eighty-nine family-nested markers with RMIPs >0.10 (marker present in >10 of 100 models) were associated with PHT line values (Figure S3). Ninety-two family-nested markers were associated with EHT and a total of 91 and 83 family-nested markers associated with DTA and NPH, respectively. All had RMIPs >0.10. Many associated markers represented clusters of linked loci and likely correspond to the same QTL.
Before accounting for family-nested QTL, differences between families explained 30 ± 4% of PHT variation in line values in the NAM and IBM families. Similarly, 46 ± 4% of EHT variation in line values was explained by family effects. By comparison, family mean differences were more important for DTA (62 ± 3%) and NPH (64 ± 4%). Beyond variation explained by families, family-nested QTL explained 46 ± 5% of PHT variation in line values. The locus explaining the most variation was found on chromosome 9 and explained 2.1 ± 0.9% of PHT variation after accounting for variation attributed to families and other estimated family-nested QTL. This estimate rose to 2.6 ± 0.5% of PHT variation among the 66 models that possessed no linked family-nested QTL within 10 cM of the locus. In total, the joint-linkage QTL models explained 76 ± 4% of PHT variation and 78 ± 3% of EHT variation, whereas they accounted for 85 ± 3% of DTA variation and 82 ± 3% of NPH variation.
Variation in QTL effects among RIL families suggested allele series for all traits. All family-nested PHT QTL had significant effects in at least four families (t-test, within family, P < 0.05) and positive and negative effects relative to the shared parent, B73. Although most family-nested QTL were not near loci with established effects on PHT, a few did colocalize with previously identified PHT loci (Table 2, Table S4, Figure S3). The most notable was a QTL with RMIP of 0.77 within the 15.5-kb locus of brassinosteroid-deficient dwarf 1 (brd1) (Pettem 1956; Makarevitch et al. 2012) on chromosome 1 at 248,503,298 bp (RefGenV1). In 18 of the 26 RIL families, this QTL was estimated to reduce PHT relative to the shared parent, B73. In addition, crinkly leaves 1 (Beavis et al. 1991) on chromosome 3 and crinkly leaves 4 (Stinard and Robertson 1987) on chromosome 10 were also near family-nested QTL with RMIPs of 0.12 and 0.19, respectively. Maize plants with the crinkly leaves 1 or 4 mutant allele are short and possess crinkly knot-bearing leaves (Stinard and Robertson 1987; Beavis et al. 1991). As previously reported, vegetative to generative transition 1 and zfl2 flowering time loci were also near family-nested QTL for DTA (Buckler et al. 2009; Romay et al. 2013) and NPH. However, these DTA loci did not overlap with family-nested QTL for PHT or EHT.
The effects of family-nested PHT QTL on EHT, DTA, and NPH were estimated to assess the importance of pleiotropy and/or linkage (Table 2, Table S4, Figure 4). Allelic effects on EHT had the strongest correlation with PHT effects, r = 0.77, P < 1e-6, whereas the weakest occurred between PHT and DTA, r = 0.16, P < 1e-6. These estimates paralleled correlations of line values across the RILs (Figure 3A). Similar trends were also observed for the effects of family-nested EHT, DTA, and NPH QTL on PHT and for QTL effects mapped in individual families (Figure S4).
Correlations of estimated PHT, EHT, DTA, and NPH family-nested QTL effects. Allele effects for family-nested QTL from 100 models were estimated for PHT, EHT, DTA, and NPH. Correlations among estimated allele effects for each trait were then calculated. Family-nested QTL for EHT, DTA, and NPH were also estimated for all traits (Figure S4).
Fine mapping a maize height QTL in two NIL families
The effects of two tropical alleles (from CML277 and CML333) at the PHT QTL explaining the most variation (2.1%) in joint-linkage mapping were independently validated in two NIL families. The QTL interval was a 15-Mb region on chromosome 9 centered at 98.5 Mb (RefGenV1). It contained five joint-linkage markers each with a RMIP >0.10 for PHT. One of the five markers was present in every model upon bootstrapped joint-linkage mapping of PHT and EHT; but two markers never entered the same model. This gave the region a combined RMIP of 1.00 and implied segregation of a common factor. In mapping NPH, two markers in the region had a combined RMIP of 0.60 but no marker with a RMIP >0.10 was found for DTA.
The CML277 and CML333 alleles of the QTL significantly increased PHT by ∼6 or 4 cm, respectively, relative to the B73 reference allele in family-nested joint-linkage models and also in single family models where the effects were estimated (Table S4 and Table S5). Two NILs with a B73 genetic background and introgressions from CML277 (90-Mb segment) or CML333 (125-Mb segment) at this QTL region were selected to independently validate and resolve the initial joint-linkage mapping results. Relative to B73, the introgressions increased PHT by ∼5 cm in both NILs (P < 5e-4). After backcrossing NILs to B73, selfing the resultant progeny, identifying recombinant haplotypes, and testing homozygous recombinant progeny lines, effects were resolved to a region of ∼10 Mb on chromosome 9 in both NIL families (CML277: 102,469,299–109,910,100, CML333: 99,948,772–109,910,100, RefGenV1). Effect estimates in this interval were maintained at 7 cm (P < 5e-4) for the CML277 allele and ∼4 cm (P < 5e-4) for the CML333 allele (Figure 5, Table S7). The intervals contained >100 genes, including several transcription factors, but no obvious PHT candidate genes with involvement in auxin, brassinosteroid, or gibberellin synthesis, transport, or signaling pathways were identified.
NIL families support of CML277 and CML333 alleles at a RIL family-nested QTL. Two families of recombinant lines with introgressions of CML277 (A) and CML333 (B) on chromosome 9 in a B73 background were queried for association with PHT to validate a NAM family-nested QTL (RefGenV1 Chr 9: 98,502,843) with significant allele effects when mapped independently in B73 × CML277 and B73 × CML333 RIL families. Within NIL families, t-tests for PHT associations using genotyping by sequencing (GBS) marker variants and Kaspar assays across a region of 10 Mb supported allele effect estimates. The smaller effect of the CML333 allele relative to CML277 also concurred within the families. Three associations (RefGenV1 Chr 9: 97,520,280; 100,367,415; and 100,371,640) identified to increase PHT relative to B73 by GWAS and segregating between B73 and CML277 or CML333 were also in the region.
GWAS of maize height and related traits
To resolve additional associations, we performed a joint-linkage-assisted genome-wide association study (GWAS) across RILs of the NAM and IBM families (Table 3, Table S6, Figure S3). In total, 277 associations with RMIP >0.05 were found for PHT. Effect sizes of the mid 95% of these associations spanned −2.4 to 1.9 cm. Many of these associations cosegregated with allele effects of their nearest family-nested QTL, yet several affected height in a direction opposing the allele effect of nearby QTL. Many associations not linked to QTL were also found. At the interval on chromosome 9 mapped in the NIL families, seven associations with RMIPs >0.05 were found. Three were polymorphic between B73 and CML277 or CML333 (Chr 9: 97,520,280; 100,367,415; and 100,371,640; RefGenV1). All increased PHT relative to the B73 allele, but none were within 100 kb of genes known to influence PHT (Figure 5, Table S7).
Nonetheless, a few significant GWAS variants in other genomic regions were near genes known to influence PHT (Table 3). The most notable colocalized with the family-nested QTL in the brd1 locus. These associations consisted of two C/T transitions (Chr 1: 248,503,977 and Chr 1: 248,505,581 bp; RefGenV1). The variants possessed effects on PHT of −1.39 cm and −1.24 cm relative to the B73 allele, RMIPs of 0.42 and 0.12, and segregated in at least 4 and 9 of the 26 families, respectively. Other significant variants near genes of interest included a C/T transition (Chr 1: 74,769,251 bp; RefGenV1) with a RMIP of 0.63. This association was 17 kb from a homolog (GRMZM2G068701) of the Arabidopsis thaliana gene encoding AUXIN UPREGULATED3 (AT4G37390) (Staswick et al. 2005). Another association (Chr 2: 1,907,158 bp; RefGenV1) with a RMIP of 0.98 was 35 kb from a homolog (GRMZM2G068202) of a rice gene (Os05g35690) encoding a Gibberellin-regulated GAST protein. Finally, an association on Chr 3 (10,179,485 bp; RefGenV1) had a RMIP of 0.22. Although no QTL were nearby, the A/G transversion was 50 kb from dwarf plant1 (d1) (Emerson 1922) (Chr 3: 10,229,006–10,228,832; RefGenV1).
We also conducted GWAS in the NCRPIS panel, identifying 213 PHT associations with a FDR of <5% (Table 4, Table S8), of which 174 had minor allele frequencies (MAFs) >5%. The mid 95% of effect sizes fell between −4.6 and 4.7 cm. No association explained over 1% of PHT variation. PHT associations identified separately in the RILs and the NCRPIS panel had limited overlap. But, a few PHT-associated variants of the NCRPIS panel did colocalize with candidate genes (Table 4, Table S8). A GWAS variant (Chr 3: 158,897,644 bp; RefGenV2) with a q-value of 0.01 was found in the semi-dwarf2 (sdw2) locus (Neuffer 1990, 1992) (Chr 3: 158,841,148–161,311,068; RefGenV2 and IBM2 2008 Neighbors map). This C/T transition had a MAF of 0.36 and explained 0.65% of PHT variation in the panel. Its effect on PHT was +2.9 cm and it was found in a calmodulin-binding transcription activator (GRMZM2G171600). Also, two PHT-associated variants (Chr 5: 175,615,577 bp and Chr 5: 175,615,580 bp; RefGenV2) with q-values of 0.01 were in complete LD 30 kb from a putative gibberellin receptor GID1L2 (GRMZM2G049675). These C/T transitions had a MAF of 0.13, an effect of +4.9 cm, and explained 0.65% of PHT variation.
Prediction of maize height and related traits
Prediction of RIL values by GBLUP and bootstrap aggregating (bagging) family-nested QTL models was significant in most traits and populations (Figure 6, Figure S6). Across RIL families, GBLUP based upon randomly chosen calibration subsets of 940 RILs (20% of line values, Figure S7) explained 63 ± 3% of PHT variation in the remaining RILs. This was greater than the 23 ± 10% of variation explained by bagging family-nested QTL models with equally sized calibration subsets (Figure 6A). As calibration subsets increased to 3760 RILs (80% of line values), differences in prediction accuracy remained and bagging family-nested QTL models still did not perform as well as GBLUP (Figure 6A).
Prediction of PHT across RIL families and within the NCRPIS diversity panel. A random sample of 20, 40, 60, and 80% of RILs in all NAM families calibrated GBLUP and family-nested QTL models to predict PHT variation of RILs not employed in calibration (A). All RILs were used to calibrate GBLUP and family-nested QTL models to explain their PHT variation (B). In the IBM family, random samples of RILs calibrated GBLUP and QTL models to predict PHT variation of the remaining RILs (C). All RILs in the IBM family were used to calibrate GBLUP and QTL models to explain PHT variation (D). Random samples of inbreds in the NCRPIS diversity panel calibrated GBLUP models to predict PHT variation of inbreds not employed in calibration (E). All inbreds in the NCRPIS diversity panel were used to calibrate a GBLUP model to explain their PHT variation (F). Similar levels of prediction accuracy were observed in GBLUP for EHT, DTA, and NPH (Figure S6).
Within RIL families, prediction accuracies by GBLUP and bagging QTL models were more similar to each other than across families. Using 38 RILs (20% of IBM family, Figure S7), 11 ± 10% of PHT variation in the remaining RILs was explained by GBLUP (Figure 6C). As calibration subsets increased to 152 RILs (80% of IBM family), GBLUP explained 23 ± 10% of PHT variation. Similar accuracies were attained bagging QTL models (Figure 6C). In addition to the IBM family, within-family PHT prediction accuracy was evaluated for each NAM family (Figure S8). In most families, results mirrored the IBM family and prediction accuracies of GBLUP and bagging QTL models rarely significantly differed. In instances where differences between the methods were observed, GBLUP always outperformed bagging of the QTL models.
Between RIL families, PHT prediction accuracies by bagging QTL models were low (<5% PHT variation predicted) for pairwise comparison of RIL families (one family was used to estimate allele effects that were then fit to genotypes of an alternate family to predict its height). However, significant PHT prediction accuracies were attained between RIL families by GBLUP. These ranged from explaining no variation in an alternate family to explaining 38% of its PHT variation (Figure 7). As expected, closely related families, such as those sharing both B73 and a sweet corn parent (B73 × IL14H and B73 × P39), had higher between-family PHT prediction accuracies than distantly related families sharing only B73. Correlation of between-family prediction accuracies and the relatedness of their families was formally tested by a modified Mantel test (r = 0.57, P < 0.01).
Prediction of PHT within and between RIL families. The PHT explanatory ability of GBLUP within family (main diagonal) and prediction accuracy of models calibrated from one family and used to predict another were assessed (off diagonal). The nonparanthetical number of each off-diagonal element details the prediction accuracy of that row’s family when used to predict that column’s family. The number in parentheses details the prediction accuracy of that column’s family when used to predict that row’s family. Families are denoted by their unshared parent and ordered based upon clustering of their between-family prediction profiles across the families.
Within the NCRPIS diversity panel, GBLUP did not perform as well as across the RIL families (Figure 6E). Using 459 inbreds (20% of the NCRPIS diversity panel, Figure S7), 37 ± 2% of PHT variation in the remaining inbreds was explained by GBLUP. Prediction accuracy did not greatly improve as calibration subset size increased. A GBLUP calibrated with 1834 inbreds (80% of the NCRPIS panel) explained 42 ± 5% of the PHT variation in the remaining inbreds.
Discussion
Despite the importance of plant height in experimental genetics and breeding (Mendel 1866), the molecular mechanics of natural PHT variation remain largely elusive. Like human height (Galton 1886; Visscher 2008; Yang et al. 2010), PHT is a highly heritable polygenic trait in natural populations. Variation in PHT is well explained by Fisher’s infinitesimal model of genetic architecture where infinitely many unlinked genes each contribute an infinitesimally small additive effect (Fisher 1919; Hill 2010). While decades of research in plant biochemistry and molecular biology reveal this is not a plausible model of genetic causality (e.g., interactions in hormonal pathways have proven importance) (Wang and Li 2008), decades of breeding show the infinitesimal model has utility in predicting response to selection in complex traits like PHT (Crossa et al. 2010). Of course, these findings are not mutually exclusive. A model’s predictive ability does not imply its capture of causality. This is particularly true in high-dimensional observational studies of natural variation with limited control of confounded (population structure and LD) and hidden (unmeasured epigenetics and microenvironments) variables. Yet, some models are useful, especially those with genetic variables estimated to robustly contribute more than expected by their infinitesimal counterpart. They allow us to begin to dissect a population’s true genetic architecture and may enhance prediction of its evolutionary change.
Contrasting mapping panels and methods
Genetic variation in PHT and related traits in the NAM and IBM families was partitioned into family-nested QTL by bootstrapped joint-linkage mapping (Table 2, Table S4, Figure S4). The size, common alleles, and well-defined structure of the families provided power to identify family-nested trait associations at a cost of the mapping resolution attained when not nesting associations (Yu et al. 2008). Joint-linkage mapping assumes colocalization of QTL for a trait across families but allows differing allele effect magnitude and direction within each family. If the genetic architecture of each family greatly differs, this assumption may bias family-nested QTL models to overestimate the degree of colocalization among effects within families. However, linkage mapping of PHT QTL conducted independently in each family yielded similar results for most family-nested PHT QTL identified by joint-linkage mapping (Table S5).
Alternatively, joint-linkage mapping may be statistically inefficient if QTL colocalize and have the same effect across families. A model fitting this architecture was assessed by GWAS across the NAM and IBM families. Comparing the results of joint-linkage mapping and GWAS revealed the strongest GWAS associations were often near the most robust family-nested QTL, but most were distributed across the genome and many GWAS associations near family-nested QTL did not cosegregate or maintain directionality of effects consistent with their nearest QTL. This may be due to differing power and resolution when mapping family-nested PHT QTL compared to GWAS. If a QTL is the cumulative effect of linked variants in repulsion phase within a family they may not be apparent in joint-linkage mapping. In GWAS ancestral recombination of parental haplotypes may prove sufficient to reveal the individual contributions of these variants across families. Conversely, if a family-nested QTL results from a single variant but its effect is epistatic and depends upon a family’s background allele frequencies, it may be identified in joint-linkage mapping, but may not associate in GWAS. These allele model scenarios and an intermediate scenario created by clustering multiple parental alleles by windows of locally inferred ancestral similarity have been compared in a recent study (Bardol et al. 2013). The study found models leveraging linkage and linkage disequilibrium information, like the joint-linkage-assisted GWAS performed herein, found more QTL than linkage mapping alone, but no single allele model performed best for all datasets and traits. This lends to the conclusion that it remains important to test many different allele model scenarios to benefit from their complementarities while mapping (Kump et al. 2011; Bardol et al. 2013).
Factors contributing to differences in associations found by the different allele model scenarios of joint-linkage mapping and GWAS across the RIL families are also likely responsible for differences in associations found by GWAS across the RIL families and the NCRPIS diversity panel. Furthermore, alternate methods used to account for population structure during GWAS in the NCRPIS panel and across the RIL families (see Materials and Methods) may also contribute to lack of widespread commonality among associations. Despite this limited congruency, effects of two alleles at the PHT QTL explaining the most PHT variation when mapped across RIL families were maintained after being isolated from other segregating loci and validated in NILs. This suggests the most robust allele associations may be less dependent upon background alleles and may retain their utility in molecular marker-assisted selection during plant breeding.
Polygenicity of maize height and frequencies of associated alleles
The normality and high heritability of PHT variation in all populations surveyed suggested a high degree of polygenicity. Joint-linkage mapping confirmed this hypothesis and showed models of family main effects and QTL effects nested in families explained less variation in PHT and EHT than in DTA and NPH, suggesting that PHT and EHT are more polygenic than DTA and NPH. The large number and small effects of associations found during GWAS of RIL families and the NCRPIS panel also supported a high degree of polygenicity for PHT (Table S8), but the additivity and manner that this variation is distributed among large and small effects remains a point of contention.
The Fisher–Orr model postulates that independent polygenic traits with equally scaled influence on fitness will have an exponential distribution of genetic effect sizes (Fisher 1930; Orr 2005; Brown et al. 2011), meaning a few alleles will have large effect but most will be small. It also posits that genetic effects fixed by selection will, at first, be large and then be reduced in a geometric sequence as a population nears optimal fitness. In this dynamic, newer rare alleles will bear larger effects that have yet to be purged; while older common alleles will bear smaller effects. This trend was found in the NCRPIS panel where rare alleles had larger effects on PHT than common alleles (Figure S9). The trend has also been found in previous maize flowering time studies (Bouchet et al. 2013).
However, it is important to note that assumptions of the Fisher–Orr model are violated in the NCRPIS panel. The most pervasive is a lack of independence of traits as well as loci. This covariance leaves potential for antagonistic pleiotropic effects among traits and repulsion phase linkages among loci. These may preserve higher frequencies of large effects. Population structure also complicates the model when subpopulations are jointly analyzed. Each subpopulation may evolve by a different evolutionary path lending selection to act upon unique genetic effects in each subpopulation due to epistasis, environment, and genetic-by-environment interactions creating shifting local fitness optima. Biological relevance of effect size and rare alleles are also confounded to an unknown extent by statistical artifact. Only rare alleles with large effects will pass significance thresholds. Similarly, false positive associations are more likely among rare alleles due to increased leverage of trait outliers and biased estimates of the PHT distribution of plants bearing the rare allele. Therefore, while the frequency spectrum of the Fisher–Orr model is approximated by PHT-associated variants it remains controversial if this is biologically relevant or a statistical artifact.
Pleiotropy of maize height and related traits
Variation in PHT was more strongly positively correlated with DTA than even EHT (Figure 3C) in the NCRPIS panel, but not the RIL families (Figures 3, A and B, Figure S2). In the NCRPIS panel, correlation between PHT and DTA may result from inclusion of late-flowering tall tropical lines. In contrast, recent recombination within the RIL families may have broken linkages of PHT and DTA loci in RIL families as evidenced by reduced correlation between PHT and DTA across RILs relative to their parents. This suggests that PHT and DTA may be controlled by distinct loci in LD in the NCRPIS panel and RIL parents and that recombination during RIL development exposed the independence and modularity of their genetic architectures within RIL families. To more directly assess the importance of pleiotropy among loci for PHT and other traits in the RIL families, correlations among common QTL effect estimates were calculated. While linkage still influences these estimates, correlations of QTL effects on traits displayed similar trends to those on line values (Figure 3A, Figure 4, Figure S2, Figure S4). Correlations between QTL effects for PHT and EHT were strongest, whereas those for PHT and DTA were weakest. These findings support the modularity of PHT and DTA genetic architectures and their independent evolvability.
Transgressive segregation and allelic series
Contrasts of trait values of RIL families and their parents revealed transgressive segregation for PHT in many families (Figure 1). Direct evidence for the basis of this transgressive segregation was found in the positive and negative effect alleles at QTL for PHT and other traits in every family. These effects suggest repulsion phase alleles were present in every inbred parent of the NAM and IBM families for every trait at available mapping resolution. The NAM panel was well designed to reveal allelic variation (or “allelic series”) at QTL (Buckler et al. 2009) due to segregation of different additive alleles within families. Apparent allelic series may also arise from family-specific epistatic interactions that map in an additive manner. Similarly, epistatic interactions may be responsible for asymmetry in transgressive segregation noted within several RIL families.
Associations with genes in canonical hormonal pathways of plant height
Examination of >120 loci and genes affecting PHT through canonical pathways (gibberellin, brassinosteroid, and auxin signaling, transport, and synthesis) and identified by molecular tagging, past cloning efforts (Multani et al. 2003), or orthology to PHT associated genes in other species identified few commonalities with associations found in the NAM and IBM families (Table 2, Table 3). Review of loci like dwarf8 and dwarf9 regulating DELLA proteins in gibberellin signaling (Peng et al. 1999) and loci containing the GA20-oxidase2 critical to gibberellin synthesis and responsible for height reductions of the Green Revolution (Monna et al. 2002) did not reveal colocalizing associations. However, a GWAS association was found near dwarf1 and its GA3-β-hydroxylase2 involved in gibberellin synthesis. This supports the effect of dwarf1 on natural PHT variation in a recent mapping effort taken in a separate maize panel (Teng et al. 2013), and its significance as a maize improvement gene in a recent genome scan for selection among the improved maize lines and the landraces of maize HapMapV2 (Hufford et al. 2012). No PHT associations were near the cytochrome P450 (Hong et al. 2002) in dwarf 3, involved in gibberellin synthesis or nana plant1 impacting brassinosteroid synthesis (Hartwig et al. 2011), but a family-nested QTL and two GWAS associations were found near a cytochrome P450 affecting brassinosteroid synthesis in brassinosteroid-deficient dwarf1 (Makarevitch et al. 2012; Stinard 2012). As for auxin, no associations were in brachytic2 impacting polar auxin transport (Multani et al. 2003); but, an association was near an ortholog of Arabidopsis’ AUXIN UPREGULATED3 (Staswick et al. 2005). Like the RIL families, few PHT associations colocalized with candidate genes in the NCRPIS panel with the exception of semi-dwarf2 (Neuffer 1990, 1992), a locus with established influence on PHT (Table 4, Table S8, Figure S5). Limited colocalization may be due to lethality of the extreme effects of established loci. Previously identified loci possess a much larger effect on PHT than estimated effects mapped in the RIL families or NCRPIS panel. For this reason, they may be functionally conserved in natural diversity, as large effect alleles are rapidly purged by selection.
Prediction of maize height and related traits
Pedigree methods and estimates of identity-by-descent have long assisted breeding programs to predict genetic merit before trait evaluation (Crossa et al. 2010). In doing so, these approaches used an infinitesimal model to interpolate a population’s phenotypic landscape and help breeders optimize allocation of evaluation resources to lines with the most predicted promise. Recently, prediction models inferring identity by state from genotyped variants, like GBLUP, models selecting variant subsets, like QTL models, and model-averaging approaches, like bagging (Breiman 1996), have accrued interest in breeding. These serve a similar purpose to pedigree-based prediction (Lorenz 2013), but leverage genotyping advances to complement pedigrees and refine comparisons of progeny assumed equally related in a pedigree model by further detailing Mendelian sampling of alleles. To assess prediction accuracies of our genetic models of PHT and other traits, we evaluated GBLUP performance in the NCRPIS panel. We also compared GBLUP and bagging of family-nested QTL models across, within, and between RIL families in cross-validation routines with calibration and predicted subsets of varied size and structure. These GBLUP analyses were performed and comparisons were made to QTL models to evaluate multiple models of genetic architecture differing in their polygenicity and effect sizes and to determine if they both plausibly explain a population’s variation in maize height among populations possessing observed and/or unobserved line values.
Within the NCRPIS panel, GBLUP revealed significant prediction accuracies for PHT, EHT, and DTA (Figure 6E), but these were lower than prediction accuracies in the NAM and IBM families. The increased diversity and reduced genotyping density in the NCRPIS panel relative to the HapMap v1 SNPs projected onto NAM and IBM families is likely the largest factor contributing to the reduced accuracy. The increased structure of the NCRPIS panel also suggests each subpopulation may possess a unique evolutionary history with selection acting on alternate genetic effects in each subpopulation due to epistatic and genetic-by-environment interactions. The additive GBLUP model may fail to capture such dependencies.
Family-nested QTL models explained substantial variation in PHT in calibration subsets, but had poor prediction accuracies. This suggests the family-nested QTL overfit the calibration subset (local region of the phenotypic landscape), reducing their generalizability when predicting the global phenotypic landscape or other localities. To reduce variance in predictions of a given line value among family-nested QTL models, bagging of the models was employed. While this reduced overfitting and slightly improved prediction, bagged family-nested QTL model estimates were still largely dependent on calibration subset size and did not perform as well as GBLUP.
Within RIL families, prediction accuracies of PHT were more similar between the two prediction methods (Table S9, Figure S8), but GBLUP outperformed bagged predictions derived from nonnested QTL mapping on many occasions. Their increased similarity in accuracies may be due to the reduced population sizes and overall reduction in prediction accuracies of both methods relative to across family prediction. Also, the increased relatedness of RILs within a family may reduce differences in prediction accuracy between models. Between RIL families, prediction was not successful by bagging QTL models and resulted in no significant prediction accuracies. This result was unsurprising given the allelic series observed at loci during family-nested QTL mapping. In contrast to between-family PHT prediction by QTL, prediction of PHT by GBLUP between families was often successful, albeit with limited prediction accuracies ranging from no accuracy to explaining 38% of variation in the predicted family (Figure 7).
In conclusion, the molecular mechanics of natural PHT variation remain difficult to disentangle and it seems unlikely that the rapid systematic elucidation of a causal model for the genetic architecture of this variation is possible. Yet, our results show multiple plausible models exist to explain natural variation in maize plant height. These vary from the highly polygenic models bearing small allelic effects, such as GBLUP, to joint-linkage QTL and GWAS models bearing larger effects and lower polygenicity. While the more polygenic GBLUP model provided an improved approach to predicting PHT and did not overfit the calibration population as much as bagging family-nested QTL models, we have also shown one of the most robust loci among the family-nested QTL models could be empirically validated in NILs and genes implicated in canonical PHT hormonal pathways like brassinosteroid-deficient dwarf1, dwarf plant1, and semi-dwarf2 colocalized with robust loci in our analyses. Two decades before Francis Galton’s studies of human height based upon the resemblance between relatives (Galton 1886; Visscher et al. 2010), Mendel used PHT to establish our understanding of the laws of inheritance (Mendel 1866). Even in this postgenomic era, PHT remains exemplary in advancing our understanding of genetic causality and evaluating approaches to explain the variation of a highly heritable, yet highly complex polygenic trait.
Acknowledgments
For growth and management of the maize fields, we are grateful to members of the Maize Diversity Project and their field assistants at the University of Missouri, North Carolina State University, University of Wisconsin, and Cornell University. For the development of a library of NILs employed in NIL validation of QTL mapping results we are grateful to Syngenta. For assistance with the construction of relationship matrices we extend thanks to Jean-Luc Jannink. We are also thankful for the substantial genotyping efforts of Cornell University and Cold Spring Harbor in preparation of SNPs across the NAM and IBM families as well as the NCRPIS diversity panel.
Footnotes
Communicating editor: A. Charcosset
- Received November 1, 2013.
- Accepted February 4, 2014.
- Copyright © 2014 by the Genetics Society of America
Available freely online through the author-supported open access option.