- Split View
-
Views
-
Cite
Cite
Haixiao Hu, Tobias A Schrag, Regina Peis, Sandra Unterseer, Wolfgang Schipprack, Shaojiang Chen, Jinsheng Lai, Jianbing Yan, Boddupalli M Prasanna, Sudha K Nair, Vijay Chaikam, Valeriu Rotarenco, Olga A Shatskaya, Alexandra Zavalishina, Stefan Scholten, Chris-Carolin Schön, Albrecht E Melchinger, The Genetic Basis of Haploid Induction in Maize Identified with a Novel Genome-Wide Association Method, Genetics, Volume 202, Issue 4, 1 April 2016, Pages 1267–1276, https://doi.org/10.1534/genetics.115.184234
- Share Icon Share
Abstract
In vivo haploid induction (HI) triggered by pollination with special intraspecific genotypes, called inducers, is unique to Zea mays L. within the plant kingdom and has revolutionized maize breeding during the last decade. However, the molecular mechanisms underlying HI in maize are still unclear. To investigate the genetic basis of HI, we developed a new approach for genome-wide association studies (GWAS), termed conditional haplotype extension (CHE) test that allows detection of selective sweeps even under almost perfect confounding of population structure and trait expression. Here, we applied this test to identify genomic regions required for HI expression and dissected the combined support interval (50.34 Mb) of the QTL qhir1, detected in a previous study, into two closely linked genomic segments relevant for HI expression. The first, termed qhir11 (0.54 Mb), comprises an already fine-mapped region but was not diagnostic for differentiating inducers and noninducers. The second segment, termed qhir12 (3.97 Mb), had a haplotype allele common to all 53 inducer lines but not found in any of the 1482 noninducers. By comparing resequencing data of one inducer with 14 noninducers, we detected in the qhir12 region three candidate genes involved in DNA or amino acid binding, however, none for qhir11. We propose that the CHE test can be utilized in introgression breeding and different fields of genetics to detect selective sweeps in heterogeneous genetic backgrounds.
THE double haploid (DH) technology based on in vivo haploid induction (HI) has become one of the most important tools in maize breeding during the past decade and is replacing the conventional method of line development by recurrent selfing (Melchinger et al. 2013). The success of this new technology became possible, because dozens of maize inducer lines have been developed worldwide (reviewed in Supplemental Material, File S1) which, when used as pollinators, trigger the production of seeds with haploid embryo at an acceptable rate, i.e., >2% (Coe 1959). Double fertilization followed by elimination of the inducer chromosomes in the embryo at later developmental stages (Li et al. 2009; Xu et al. 2013) as well as parthenogenesis (Sarkar and Coe 1966; Beckert et al. 2008) have been proposed as mechanisms for HI in maize, but a proof of these hypotheses requires profound knowledge about the genetic and physiological factors underlying this phenomenon.
All previous QTL mapping studies for unraveling the genetic architecture of HI detected a major QTL on chromosome 1 (Röber 1999; Beckert et al. 2008; Prigge et al. 2012). The most comprehensive study with four biparental populations (Prigge et al. 2012) mapped this QTL, termed qhir1, to bin 1.04 and hypothesized that it is required for HI, but QTL positions and 1-LOD support intervals differed substantially among populations. In another study with population 1680 × UH400, Dong et al. (2013) fine mapped a 3.57-Mb region between markers umc1917 and bnlg1811, which targeted the QTL qhir1 and identified a 243-kb region with significant effect on HI. Both studies employed inbred UH400 as inducer parent, which limits the inference on HI to this specific inducer line. Moreover, in view of the uncertainties associated with the exact QTL position, concentrating the fine mapping on a very narrow region carries the risk of overlooking important adjacent segments. Therefore, our objectives were to (i) detect selective sweeps for HI in a worldwide collection of inducers (cases) and noninducers (controls) by a genome-wide association study (GWAS); (ii) identify a candidate region(s) underlying qhir1; (iii) validate the fine-mapping results reported by Dong et al. (2013) in a broader set of genetic material with an independent, complementary approach; and (iv) resequence the qhir1 region for identification of candidate genes involved in HI in maize. For application of GWAS, we developed a novel method that can deal with almost perfect confounding between genetic ancestry and trait expression.
Results and Discussion
We genotyped a worldwide collection of 53 maize inducer lines from 29 breeding programs (Table S1) for 56,110 SNPs on the Illumina MaizeSNP50 Bead Chip (50k SNP chip; Ganal et al. 2011). From various public and private databases, we gathered marker data obtained with this SNP chip for 1482 inbred lines (File S2) chosen to represent the global genetic diversity of maize from seven germplasm groups. To the best of our knowledge, these lines possess zero or very low HI rate, and therefore, are subsequently referred to as noninducers. To balance the number of lines within each noninducer group with the number of inducers, we created a core set of 363 lines using established methods (Liu and Muse 2005). The core set consisted of the 53 inducers and 310 noninducers (50 lines from each noninducer germplasm group with two groups having fewer than 50 lines, Table S2).
Principal component analysis (PCA) of the core set clearly separated the group of inducers from all seven germplasm groups of noninducers, and cluster analysis revealed close relatedness among subsets of the 53 inducers (Figure 1). The clear separation between inducers and noninducers was corroborated by plots of the first two principle components from separate PCAs of inducers against all lines from each germplasm group of the 1482 noninducers (Figure S1).
To identify genomic segments associated with HI, we performed a GWAS with various established methods for case-control association analysis (Purcell et al. 2007; Wellcome Trust Case Control Consortium 2007) and detection of selective sweeps (Voight et al. 2006; Tang et al. 2007; Chen et al. 2010; Fariello et al. 2013). The standard case-control association analysis (Purcell et al. 2007) detected no striking signals and showed a high genomic inflation factor (λ = 33.3, Figure S3, A and B). Likewise, several popular methods for identifying selective sweeps in humans and animals (Vitti et al. 2013) failed to detect clear signals (Figure S3, C–F). Neither the within-population test applied to the 53 inducers using the iHS score (Voight et al. 2006) nor the between-population test treating the 53 inducers and 310 noninducers as two populations and employing the Rsb score (Tang et al. 2007) yielded significant signals. In addition, we applied two population differentiation-based tests that implemented different algorithms. Using the hapFLK score (Fariello et al. 2013) based on the differences of haplotype frequencies between populations, we detected a few significant signals on chromosome 9. Likewise, the cross population composite likelihood ratio (XP-CLR) score (Chen et al. 2010) yielded high XP-CLR values on chromosomes 1 and 6. However, further analyses of haplotypes in these regions detected with either method revealed that the major haplotypes found in the inducer group were present only in a subset of them (Figure S4), indicating that these regions are not required for HI.
Although the various methods for GWAS differ in their rationale, their common assumption is that the individuals under investigation are largely unrelated to each other (Voight et al. 2006; Purcell et al. 2007; Tang et al. 2007; Chen et al. 2010; Fariello et al. 2013). However, in this study, we encountered a different data structure, in which the cases (inducers) are closely related with each other because they share a common ancestor (Stock6 or a later version of it maintained by the Maize Stock Center; Lawrence et al. 2005) not more than six breeding cycles distant, whereas the controls (noninducers) can be considered largely unrelated among themselves and with the cases (inducers). Thus, this resulted in almost perfect confounding of population structure with cases and controls (Figure 1; Figure S1), which represents an unsolved problem for all GWAS approaches mentioned above.
To solve this problem, we developed a novel approach, termed conditional haplotype extension (CHE) test, in which the cases are first scanned for detection of long haplotypes fixed in this set of genotypes. The rationale behind this step is that linkage drag results in long segments of DNA being transferred during trait introgression (Sabeti et al. 2002). In the second step, a formal statistical test based on the Clopper–Pearson confidence interval (Clopper and Pearson 1934) is applied for testing the hypothesis that transmission of the detected haplotypes through known pedigrees of the cases cannot be explained by chance alone (described in detail in File S3).
In the first step, the top 10 segments fixed in all 53 inducers (cases) exceeded 1 Mb in length (Figure 2, Table 1). In the second step, among 19 inducers (described in File S3) derived from matings between inducers and noninducers, only the longest segment on chromosome 1 and another shorter segment on chromosome 6 were significant at P < 0.01 (Figure 2, Table 1). The segment identified on chromosome 1 spanned 3.97 Mb on the physical map, overlapped with all support intervals of qhir1 from four QTL mapping populations (Figure 3, A–C; Prigge et al. 2012), and was denoted qhir12. Adjacent to this region was a shorter 0.54-Mb segment denoted qhir11 (Figure 2B), which harbored the 243-kb region fine mapped by Dong et al. (2013) and was fixed in all inducers and significant in the Clopper–Pearson test (Table 1); for these reasons, this segment was also considered in our subsequent analyses.
Characterization of 11 genomic segments on the basis of SNP data from the 50k SNP chip
Chr. . | Start position (bp) . | End position (bp) . | CHE score (bp) . | Number of SNPs . | Frequency in NI (%) . | CHE test . | Segment name . |
---|---|---|---|---|---|---|---|
1 | SYN4966 | PZA00714.1 | 3,972,726 | 90 | 0.0 | ** | qhir12 |
71,795,509 | 75,768,235 | ||||||
1 | PZE-101114336 | PZE-101114759 | 2,075,601 | 5 | 63.0 | NS | |
130,455,842 | 132,531,443 | ||||||
1 | PZE-101114797 | PZE-101115057 | 1,384,430 | 3 | 74.8 | NS | |
132,849,879 | 134,234,309 | ||||||
1 | PZE-101115217 | PZE-101115612 | 1,902,702 | 7 | 47.9 | NS | |
135,276,739 | 137,179,441 | ||||||
1 | PZE-101115912 | PZE-101116234 | 1,063,397 | 11 | 63.6 | NS | |
138,641,589 | 139,704,986 | ||||||
4 | PZE-104010475 | PZE-104010863 | 1,735,726 | 6 | 88.4 | NS | |
7,618,125 | 9,353,851 | ||||||
4 | PZE-104057913 | PZE-104058294 | 1,270,081 | 19 | 75.5 | NS | |
110,071,345 | 111,341,426 | ||||||
5 | PZE-105051178 | PZE-105051594 | 1,286,008 | 27 | 53.9 | NS | |
44,623,312 | 45,909,320 | ||||||
5 | PZE-105087655 | PZE-105087886 | 1,051,432 | 16 | 69.2 | NS | |
114,100,330 | 115,151,762 | ||||||
6 | SYNGENTA12397 | PZE-106010794 | 1,224,871 | 18 | 10.0 | * | |
28,127,747 | 29,352,618 | ||||||
1 | PZE-101081177 | SYN25793 | 535,984 | 16 | 2.7 | * | qhir11 |
68,134,633 | 68,670,617 |
Chr. . | Start position (bp) . | End position (bp) . | CHE score (bp) . | Number of SNPs . | Frequency in NI (%) . | CHE test . | Segment name . |
---|---|---|---|---|---|---|---|
1 | SYN4966 | PZA00714.1 | 3,972,726 | 90 | 0.0 | ** | qhir12 |
71,795,509 | 75,768,235 | ||||||
1 | PZE-101114336 | PZE-101114759 | 2,075,601 | 5 | 63.0 | NS | |
130,455,842 | 132,531,443 | ||||||
1 | PZE-101114797 | PZE-101115057 | 1,384,430 | 3 | 74.8 | NS | |
132,849,879 | 134,234,309 | ||||||
1 | PZE-101115217 | PZE-101115612 | 1,902,702 | 7 | 47.9 | NS | |
135,276,739 | 137,179,441 | ||||||
1 | PZE-101115912 | PZE-101116234 | 1,063,397 | 11 | 63.6 | NS | |
138,641,589 | 139,704,986 | ||||||
4 | PZE-104010475 | PZE-104010863 | 1,735,726 | 6 | 88.4 | NS | |
7,618,125 | 9,353,851 | ||||||
4 | PZE-104057913 | PZE-104058294 | 1,270,081 | 19 | 75.5 | NS | |
110,071,345 | 111,341,426 | ||||||
5 | PZE-105051178 | PZE-105051594 | 1,286,008 | 27 | 53.9 | NS | |
44,623,312 | 45,909,320 | ||||||
5 | PZE-105087655 | PZE-105087886 | 1,051,432 | 16 | 69.2 | NS | |
114,100,330 | 115,151,762 | ||||||
6 | SYNGENTA12397 | PZE-106010794 | 1,224,871 | 18 | 10.0 | * | |
28,127,747 | 29,352,618 | ||||||
1 | PZE-101081177 | SYN25793 | 535,984 | 16 | 2.7 | * | qhir11 |
68,134,633 | 68,670,617 |
The 10 genomic segments with the highest CHE scores were obtained from a genome-wide scan of 53 inducers with the CHE test. One additional segment (qhir11) harbors the 243 kb segment fine-mapped by Dong et al. (2013). NI, noninducers; **P < 0.001, *P < 0.01; NS, not significant.
Chr. . | Start position (bp) . | End position (bp) . | CHE score (bp) . | Number of SNPs . | Frequency in NI (%) . | CHE test . | Segment name . |
---|---|---|---|---|---|---|---|
1 | SYN4966 | PZA00714.1 | 3,972,726 | 90 | 0.0 | ** | qhir12 |
71,795,509 | 75,768,235 | ||||||
1 | PZE-101114336 | PZE-101114759 | 2,075,601 | 5 | 63.0 | NS | |
130,455,842 | 132,531,443 | ||||||
1 | PZE-101114797 | PZE-101115057 | 1,384,430 | 3 | 74.8 | NS | |
132,849,879 | 134,234,309 | ||||||
1 | PZE-101115217 | PZE-101115612 | 1,902,702 | 7 | 47.9 | NS | |
135,276,739 | 137,179,441 | ||||||
1 | PZE-101115912 | PZE-101116234 | 1,063,397 | 11 | 63.6 | NS | |
138,641,589 | 139,704,986 | ||||||
4 | PZE-104010475 | PZE-104010863 | 1,735,726 | 6 | 88.4 | NS | |
7,618,125 | 9,353,851 | ||||||
4 | PZE-104057913 | PZE-104058294 | 1,270,081 | 19 | 75.5 | NS | |
110,071,345 | 111,341,426 | ||||||
5 | PZE-105051178 | PZE-105051594 | 1,286,008 | 27 | 53.9 | NS | |
44,623,312 | 45,909,320 | ||||||
5 | PZE-105087655 | PZE-105087886 | 1,051,432 | 16 | 69.2 | NS | |
114,100,330 | 115,151,762 | ||||||
6 | SYNGENTA12397 | PZE-106010794 | 1,224,871 | 18 | 10.0 | * | |
28,127,747 | 29,352,618 | ||||||
1 | PZE-101081177 | SYN25793 | 535,984 | 16 | 2.7 | * | qhir11 |
68,134,633 | 68,670,617 |
Chr. . | Start position (bp) . | End position (bp) . | CHE score (bp) . | Number of SNPs . | Frequency in NI (%) . | CHE test . | Segment name . |
---|---|---|---|---|---|---|---|
1 | SYN4966 | PZA00714.1 | 3,972,726 | 90 | 0.0 | ** | qhir12 |
71,795,509 | 75,768,235 | ||||||
1 | PZE-101114336 | PZE-101114759 | 2,075,601 | 5 | 63.0 | NS | |
130,455,842 | 132,531,443 | ||||||
1 | PZE-101114797 | PZE-101115057 | 1,384,430 | 3 | 74.8 | NS | |
132,849,879 | 134,234,309 | ||||||
1 | PZE-101115217 | PZE-101115612 | 1,902,702 | 7 | 47.9 | NS | |
135,276,739 | 137,179,441 | ||||||
1 | PZE-101115912 | PZE-101116234 | 1,063,397 | 11 | 63.6 | NS | |
138,641,589 | 139,704,986 | ||||||
4 | PZE-104010475 | PZE-104010863 | 1,735,726 | 6 | 88.4 | NS | |
7,618,125 | 9,353,851 | ||||||
4 | PZE-104057913 | PZE-104058294 | 1,270,081 | 19 | 75.5 | NS | |
110,071,345 | 111,341,426 | ||||||
5 | PZE-105051178 | PZE-105051594 | 1,286,008 | 27 | 53.9 | NS | |
44,623,312 | 45,909,320 | ||||||
5 | PZE-105087655 | PZE-105087886 | 1,051,432 | 16 | 69.2 | NS | |
114,100,330 | 115,151,762 | ||||||
6 | SYNGENTA12397 | PZE-106010794 | 1,224,871 | 18 | 10.0 | * | |
28,127,747 | 29,352,618 | ||||||
1 | PZE-101081177 | SYN25793 | 535,984 | 16 | 2.7 | * | qhir11 |
68,134,633 | 68,670,617 |
The 10 genomic segments with the highest CHE scores were obtained from a genome-wide scan of 53 inducers with the CHE test. One additional segment (qhir11) harbors the 243 kb segment fine-mapped by Dong et al. (2013). NI, noninducers; **P < 0.001, *P < 0.01; NS, not significant.
The qhir12 segment was not detected by Dong et al. (2013), as it lies 985 kb outside (downstream) the marker interval originally chosen for fine mapping, but their results from cross 1680 × UH400 provide strong evidence in support of a second region linked to their 243-kb fine-mapped segment, because the effect of the entire qhir1 region found in the F2 generation (see figure 2 in Dong et al. 2013) was more than twice the effect of the 243-kb segment segregating in F3 progeny of recombinant F2 individuals (see figure 3 in Dong et al. 2013). Thus, the 243-kb segment making up about half of the qhir11 segment detected in our study, explained less than one quarter of the genetic variance of HI attributable to QTL qhir1.
On chromosome 6, a 1.22-Mb segment was fixed among all inducers and significant in the Clopper–Pearson test (Table 1). Consequently, this segment may also have an effect on HI, but the evidence was not as strong as for qhir12, because 10% of the controls also harbored this segment (Table 1). For this reason and due to the prominent role of QTL qhir1 in previous studies, we decided to focus subsequently on genomic segments detected on chromosome 1.
To determine whether both or only one of these regions harbor the gene(s) required for HI, we traced the transmission of both segments in the pedigree of all 53 inducers and reconstructed the respective recombination events (Lai et al. 2010) in a 50.34-Mb genomic region denoted as qhir1-combined support interval (CSI), which covered the 1-LOD support intervals of qhir1 from four QTL mapping populations (Prigge et al. 2012) and contained the qhir11 and qhir12 segments (Figure 3A). Based on the 1123 SNP markers of the 50k SNP chip found in this region, both qhir11 and qhir12 were regarded as identical by descent among the 53 maize inducers and derived from one of the various versions of Stock6 (Figure 3C). To corroborate this result with even higher marker density, we genotyped a representative subset of 17 inducers (indicated in Table S1) with a 600k SNP chip described by Unterseer et al. (2014), which included 15,602 SNP markers in the qhir1-CSI. While the segment qhir12 had a single haplotype across all inducers, two haplotypes were observed for qhir11 (Figure 3D, Figure S5). This indicates that the minor haplotype allele of qhir11 together with its neighbor segments present in Stock6.M741H and Stock6.ROM either did not originate from the original version of Stock6 (i.e., Stock6.M741F; Lawrence et al. 2005) or was altered due to genomic rearrangements caused by active (retro-)transposons. This haplotype allele, which has still high congruency with the major haplotype allele of qhir11 within the 243-kb fine-mapped fragment, was also found in two noninducers, Mo1W and Tx303 lacking the qhir12 haplotype allele common to all 53 inducers. Since HI rates of these two lines were in the range of spontaneously occurring haploids in maize (Table S3; Chase 1969), we conclude that the minor haplotype allele of qhir11 is not sufficient for HI in maize. However, this does not allow conclusions to be drawn on the effect of the major haplotype allele of qhir11 and its 243-kb segment identified by Dong et al. (2013) via use of inducer UH400. Thus, we propose to further investigate the effect of qhir11 and qhir12 on HI for example by comparing the HI of near-isogenic lines differing in one or both of these segments or by analyzing selfed progenies of recombinants that segregate for one segment while the other segment is fixed either for presence or absence of the HI-effective haplotype allele.
Since qhir11 and qhir12 were identified with a selective sweep approach, selection for characters other than HI could also explain our findings. During development of the 19 progeny inducers that were subjected to the Clopper–Pearson test, selection was primarily for high HI rate and good expression of the R1-nj embryo-color marker and of the B1 stalk-color marker. The R1-nj marker has been mapped to chromosome 10 and the the B1 marker to chromosome 2. Thus, selection for these markers cannot explain fixation of qhir11 and qhir12 on chromosome 1. In addition, not all 53 inducers analyzed for selective sweeps carry these color markers. For example, inducers ACIR, Stock6.M741B (R1-r), Stock6.M741C (R1-r), and Stock6.M741F (R1-g) do not carry the R1-nj marker and inducer IN605a does not carry the B1 marker, but these inducers still harbor both the qhir11 and qhir12 segments. Altogether, these arguments provide strong evidence that fixation of qhir11 and qhir12 among the inducers was exclusively attributable to selection for HI.
To locate candidate genes for HI, we searched for mutated coding sequences in these two segments by comparing resequencing data of inducer CAU5 (depth of 11.22× coverage) with sequences of 14 noninducers important in global maize breeding (Table S4). CAU5 was chosen due to its close relationship with many other inducers, because both its parents (CAUHOI and UH400) have HI ability and served as parents or grandparents in development of new inducers. In the genic regions of qhir11 and qhir12, we found 49 amino acid changes (AACs), 20 insertions or deletions (InDels), and 3 structural variants comparing the inducer to the noninducer sequences (Table S5), which involved 44 of all genes in these two regions. For 14 of these genes (Table S6), annotations were available either from Interpro (Mitchell et al. 2014) or UniProt (UniProt Consortium 2014). Three of these genes in the qhir12 region, GRMZM2G137502 and GRMZM2G135834, each encoding a DNA binding protein, and GRMZM2G096682, encoding an amino acid binding protein, constitute intuitive candidates for triggering HI in maize. In agreement with both hypotheses for in vivo HI in maize (Sarkar and Coe 1966; Beckert et al. 2008; Li et al. 2009; Xu et al. 2013) and characters associated with HI (Prigge et al. 2012; Qiu et al. 2014), their mutant versions might be involved in chromosomal segregation distortion. Besides the structural candidates identified in the coding sequences of these genes, we cannot exclude that the causal mutation is located in a regulatory region as has been shown for other genes (e.g., Hanson et al. 1996; Clark et al. 2006; Salvi et al. 2007). In any case, reverse genetic studies such as RNA interference (Zuo et al. 2015) or targeted mutagenesis (Char et al. 2015; Svitashev et al. 2015) are needed to verify candidate genes. For qhir11, no intuitive candidates were found (Table S6).
Modern inducer lines have considerably higher HI rates than the Stock6 founders (Table S1) due to the effect of multiple QTL as indicated by QTL mapping results with various inducers such as Stock6 (Röber 1999), PK6 (Barret et al. 2008), and UH400 (Prigge et al. 2012). Different from these studies, we aimed at detecting the subset of QTL that is common to all inducers in maize, especially those QTL necessarily required for HI and not just for modifying its rate. By searching with our CHE approach for genomic regions fixed in a worldwide collection of inducers, we obtained evidence in support of the hypothesis of Prigge et al. (2012) that QTL qhir1 is required for HI.
The CHE test developed in this study closes a gap in GWAS, when population structure is strongly confounded with the occurrence of cases and controls. This situation is often encountered in crop species, if major genes for resistance and other agronomic important traits are transferred from a wild ancestor to elite germplasm by introgression breeding (see examples in Table S7). However, this problem exists also in genetic studies with humans and animals (Laird et al. 2005) if a novel allele is rapidly spread by matings of the original carrier to other individuals from various populations. Thus, the CHE test promises to expand the collection of GWAS methods to applications where ancestry and trait expression are highly confounded.
Materials and Methods
Germplasm
In this study, a genotype is referred to as an inducer (case) if it has a HI rate of at least 2% (Coe 1959). We collected a total of 53 maize inducers originating from 29 different breeding populations in China, France, Germany, India, Mexico, Moldova, Romania, Russia, and the United States (Table S1). All inducers were highly inbred and developed from different types of source populations by recurrent selfing for at least five generations accompanied by evaluation and selection for a high HI rate. Subsequently, these inducers were maintained by selfing or sib mating to warrant a high level of uniformity and homozygosity. Information about their pedigree and HI rate were obtained either from the literature or by personal communication with breeders from the institutions providing the materials. The pedigree of all inducers (Table S1) together with their noninducer parents (if known) were plotted (Figure S2) using the package pedigraph v2.4 (Garbe and Yang 2008). In addition, we included molecular data from 1482 inbred lines (File S2) selected for good marker quality from a total of 1963 inbred lines available from public breeding programs or databases. These lines are subsequently referred to as noninducers and are assumed to possess zero or very low HI rate. If some of these controls have been misclassified and possess HI ability, in contrast to our assumption, this has no effect on the first step of our CHE approach and would merely reduce the power of the test in the second step but would not result in false positives. However, presence of HI in germplasm not selected for this trait is very unlikely for the following reasons: (i) In vivo HI in maize is associated with endosperm abortion, embryo abortion, and segregation distortion (Prigge et al. 2012; Xu et al. 2013). Maintenance breeding of inducers requires continuous selection for HI to counteract the strong negative effects on fitness of this character (Melchinger et al. 2016). Since all control inbreds have been bred for good agronomic performance, and were not selected for HI, it is extremely unlikely that HI is present. (ii) Among the seven noninducers tested for HI (Table S3), none of them showed HI rates significantly different from zero.
Based on breeders’ knowledge or pedigree information, these lines were assigned to seven germplasm groups: European Dent (EUD, N = 399), European Flint (EUF, N = 408), Stiff Stalk (SS, N = 123), Non-Stiff Stalk (NSS, N = 193), Tropical and Subtropical (TST, N = 299), Domestic China (DCN, N = 33), and Miscellaneous (MIS, N = 27) lines comprising Teosinte (N = 10), Popcorn (N = 9), and Sweet Corn (N = 8) genotypes.
Genotyping
After DNA extraction, the 53 inducers were genotyped with the Illumina MaizeSNP50 BeadChip (Ganal et al. 2011), referred to as 50k SNP chip. Genotypic data collected with the same SNP chip for the 1482 noninducers were obtained for 834 lines from our own database, for 335 lines from Yang et al. (2011) and for the remaining 313 lines from Cook et al. (2012) and Ganal et al. (2011). Quality control of the SNP data encompassed two steps for screening of markers and genotypes. Markers were selected if (i) their call frequency exceeded 0.80 across all inducers and 0.90 across all noninducers and (ii) heterozygosity was <10% across all inducers and <5% across all noninducers. Noninducer genotypes were included if (i) their call rate exceeded 95% and (ii) their heterozygosity across all markers was <5%. A total of 40,572 SNPs and 1482 noninducers met these criteria and were used for further analyses together with the 53 inducers. The 1.05% missing marker data in all 1535 lines were subsequently imputed with software Beagle 3.3.2 (Browning and Browning 2007).
In addition to genotyping with the 50k SNP chip, 17 inducers (indicated in Table S1) were chosen for genotyping with the Affymetrix Axiom Maize Genotyping Array (Unterseer et al. 2014), referred to as 600k SNP chip. These 17 inducers were chosen to represent most of the genetic diversity among all 53 inducers according to pedigree information. Additionally, two noninducer inbred lines, Mo1W and Tx303, were also genotyped with this 600k SNP chip.
Genetic structure analyses
Genetic structure analyses of inducers and noninducers were based on a subset of 29,553 markers obtained after excluding 11,019 Syngenta markers from the entire set of 40,572 SNPs. This was taken as a precaution measure to minimize a possible ascertainment bias, because the Syngenta markers were specifically selected for polymorphism between B73 and Mo17 (Ganal et al. 2011). First, we determined with software PowerMarker v3.25 (Liu and Muse 2005) a subset of 50 lines capturing maximum diversity for each of the five germplasm groups (EUD, EUF, SS, NSS, and TST) with N > 50. Together with the 53 inducers, and the 33 DCN and 27 MIS lines, this yielded a core set of 363 lines (Table S2). Second, a PCA was performed with this core set as well as with inducers against all lines from each germplasm group of the 1482 noninducers. A three-dimensional plot for PCA of the core set and two-dimensional plots for the other PCAs were obtained by using R package rgl (Adler et al. 2014) and standard R software (R Development Core Team 2013), respectively. Third, we produced a neighbor-joining tree of the 53 inducers based on cluster analysis of Rogers’ distance (Rogers 1972) estimates using R package ape (Paradis et al. 2004).
Application of established GWAS methods for detecting individual SNPs or selective sweeps associated with HI
We analyzed our data with the following methods for detecting individual SNPs or selective sweeps associated with target traits. First, a genome-wide case-control association analysis (Wellcome Trust Case Control Consortium 2007), in which inducers were considered as cases and noninducers as controls, was performed using software package Plink1.07 (Purcell et al. 2007). Second, we computed iHS and Rsb scores following Voight et al. (2006) and Tang et al. (2007), respectively, using R package rehh (Gautier and Vitalis 2012) to detect selective sweeps with long-range haplotypes (Sabeti et al. 2002) associated with HI. Third, we applied a population differentiation-based approach to detect selective sweeps associated with HI with the hapFLK score following Fariello et al. (2013) using their software package hapFLK. Finally, a composite likelihood method, the XP-CLR score (Chen et al. 2010), for detecting selective sweeps was applied using the XP-CLR package.
A novel method for identifying selective sweeps under population structure–trait confounding
Since all methods described in the previous section failed in the analysis of our data, we developed a novel two-step approach for detecting selective sweeps underlying HI.
In the first step, a conditional haplotype extension procedure was applied to the group of cases (i.e., inducers) for detecting all segments with both high frequency and long stretch. In a genome-wide scan, where markers are ordered according to their physical positions on the chromosome, each marker is analyzed one by one with the following procedure (see an illustration in Figure S6). Starting with marker m, we considered the genome segment spanning from marker m − l on the left side to marker m + r on the right side as a haplotype block. The values of l and r start at zero and are subsequently increased stepwise to the next integer, but independently in both directions. For each step of haplotype block extension, the frequency of the major haplotype within the block is determined in the cases. The maximum values of l and r for which the frequency of the major haplotype from m − l to m + r does not fall below a given threshold t are designated as l* and r*, respectively. The physical distance (in megabases) from marker m − l* to marker m + r* is referred to as CHE score, as an abbreviation for conditional haplotype extension in physical map units, and used as criterion for screening the entire genome. Various threshold t values can be chosen depending on the population under study. In our study, the objective was to detect the genomic segments required for HI among all maize inducers; therefore, we chose the very stringent threshold t = 1.0, which results in detection of long genomic segments fixed among all 53 inducers.
In the second step, a formal statistical test was carried out for the top n (n = 10 in our study) segments with the highest CHE scores detected in the first step (details were described in File S3) as well as for the qhir11 segment that was not among the 10 segments with the highest CHE score but for which prior knowledge existed from Dong et al. (2013). Briefly, we calculated for each genomic segment separately a Clopper–Pearson confidence interval (Clopper and Pearson 1934) for testing the hypothesis that transmission of the detected segment through known pedigrees of the cases cannot be explained by chance alone in the development of new inducers.
Graphical genotype analysis
Based on the 1-LOD support intervals of QTL qhir1 from four segregating populations (Prigge et al. 2012), we first determined a combined support interval for qhir1 (qhir1-CSI) with the following steps: (i) search for the eight nearest markers outside the 1-LOD support intervals of qhir1 from the four segregating populations, and (ii) determine the farthest left and farthest right markers among the eight markers. This revealed a genomic region spanning from position 46.21 to 96.55 Mb on chromosome 1 (Figure 3A) according to the maize B73 AGP_v2 (Schnable et al. 2009).
Subsequently, we inferred the segment transmission from founders to progeny inducers on the basis of the pedigree provided by maize breeders (Figure S2) using the 50k SNP chip marker data. Briefly, the segment of Stock6.M741F in the qhir1-CSI was considered as source genome fragment in the entire region of qhir1-CSI, because it represents the original Stock6 (Lawrence et al. 2005). For the 52 remaining inducers, we determined the origin of their genomic fragments in the qhir1-CSI in two steps. First, we compared the marker profile of a specific inducer with that of all possible founders involved in its pedigree (Figure S2) to identify the map positions of former recombination sites. Thus, its genome in the qhir1-CSI was divided into several fragments on the basis of putative recombination sites. Second, for a specific fragment flanked by a pair of adjacent recombination sites, we determined its oldest founder among all founders having identical marker profile with this fragment.
To examine the reliability of graphical genotypes constructed with the 50k SNP chip, we also constructed graphical genotypes in the qhir1-CSI region for the 17 selected maize inducers (indicated in Table S1 and described in the section Genotyping) genotyped with the 600k SNP chip using the same procedure as described above.
Evaluation of HI rate of two noninducers
As shown in the text (Figure 3D), based on the 186 SNP markers from the 600k SNP chip in the qhir11 region, the minor haplotype allele present in two inducers was also found in two noninducers, Mo1W and Tx303. To test whether this haplotype allele alone confers HI in maize, Mo1W and Tx303, together with five inducers and five randomly chosen noninducers as controls (Table S3) were crossed to a liguleless (lg2) tester for evaluating their HI rate. After harvest, we randomly chose ∼1000 kernels from each of the testcrosses and seeded them in the greenhouse to identify haploid plants in growth stage v3 (Abendroth et al. 2011) on the basis of the liguleless phenotype followed by flow cytometry analysis to confirm haploidy of the plants classified as liguleless.
Resequencing data analysis
Inducer line CAU5 and noninducer line 1680 from China Agricultural University as well as noninducer lines Lo11, D06, F98902, B73, EP1, PH207, and Teosinte from the University of Hohenheim were sequenced by the Illumina HiSeq 2000 platform (NCBI BioProject PRJNA260788; Unterseer et al. 2014). Genome resequencing data of noninducer lines Mo17, CML103, Dan340, Huangzaosi, Chang7-2, and Zheng58 were obtained from Chia et al. (2012; NCBI Sequence Read Archive SRA051245) and Jiao et al. (2012; NCBI Sequence Read Archive SRA049859).
The complete resequencing analysis for the qhir1-CSI region was performed with software CLC Genomics Workbench 7.5.1 (CLC Bio, http://www.clcbio.com). If not mentioned specifically, the parameter setting was default. After import of the raw genome sequencing data, the reads were trimmed: minimum number of nucleotides of a read = 15. Trimmed reads were mapped to the B73 genome (RefGen_v2; Schnable et al. 2009). The parameters for read mapping (one mapping per line) are length fraction of alignment = 0.8, auto-detect paired distances = no, and nonspecific match handling = ignore. A detailed mapping report was created for the qhir1-CSI region (Table S4). InDels and structural variants were detected for each mapping.We performed the Fixed Ploidy Variant Detection model of CLC Genomics Workbench on each mapping file to detect sequence variations. Splice site effects and amino acid changes were analyzed using genome annotation of B73 genome RefGen_v2 (Zea_mays.AGPv2.15.gtf.gz at ftp://ftp.ensemblgenomes.org/pub/plants/release-15/gtf/zea_mays/). A genotype was called if it was supported by at least 10 reads with at least 90% of the reads being consistent with the major allele (threshold for homozygous calls) and with <10% of the reads indicating gaps or missing calls. Genotype calls from each mapping file were combined and only biallelic SNPs with at least one inducer and at least one noninducer call were considered for further analyses. All analyses were performed within R (R Development Core Team 2013).
Data availability
File S4 and File S5, contain information about SNP marker and genotypes analyzed in this study with the 50k SNP chip and the 600k SNP chip, respectively. Resequencing data of inducer line CAU5 and noninducer line 1680 has been submitted to NCBI (accession: SRP065659). File S6 contains literature cited in the supplemental files.
Acknowledgments
The authors thank J. Eder, F. Qiu, M. Sachs, and M. Beckert for providing materials of maize inducers used for genotyping; H. Silva, M. Halilaj, and J. Böhm for help with the liguleless and flow cytometry analyses; and W. Molenaar and H. Zhao for comments on earlier versions of the manuscript. We thank two anonymous reviewers for very helpful suggestions for improving the content of this publication.
Author contributions: A.E.M. designed this project and supervised the research. H.H., T.A.S., and A.E.M. wrote the manuscript, and all co-authors were involved in editing the manuscript. H.H. and T.A.S. performed most of the data analyses and developed the CHE test. C.C.S. contributed to production of the genotyping; S.C and J.L. produced the resequencing data; and R.P., S.U., and C.C.S. analyzed the resequencing data. J. Y. contributed genotypic data for some maize inbred lines. W.S., S.C, B.M.P., O.A.S., V.R., A.Z., S.K.N. and V.K. developed maize inducers.
Footnotes
Communicating editor: A. H. Paterson
Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.115.184234/-/DC1.
These authors contributed equally to this work.
Literature Cited
Adler, D., O. Nenadic, and W. Zucchini, 2014 rgl: 3D visualization device system (OpenGL). R package version 0.93.1098. Available at: http://CRAN.R-project.org/package=rgl. Accessed: January 20, 2016.
R Development Core Team, 2013 R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna. Available at: http://www.R-project.org. Accessed: March 07, 2016.
Röber, F., 1999 Fortpflanzungsbiologische und genetische Untersuchungen mit RFLPMarkern zur in-vivo-Haploideninduktion bei Mais. Ph.D. Thesis. Universität Hohenheim, Stuttgart, Germany.