Genetic architecture of flowering time in maize was addressed by synthesizing a total of 313 quantitative trait loci (QTL) available for this trait. These were analyzed first with an overview statistic that highlighted regions of key importance and then with a meta-analysis method that yielded a synthetic genetic model with 62 consensus QTL. Six of these displayed a major effect. Meta-analysis led in this case to a twofold increase in the precision in QTL position estimation, when compared to the most precise initial QTL position within the corresponding region. The 62 consensus QTL were compared first to the positions of the few flowering-time candidate genes that have been mapped in maize. We then projected rice candidate genes onto the maize genome using a synteny conservation approach based on comparative mapping between the maize genetic map and japonica rice physical map. This yielded 19 associations between maize QTL and genes involved in flowering time in rice and in Arabidopsis. Results suggest that the combination of meta-analysis within a species of interest and synteny-based projections from a related model plant can be an efficient strategy for identifying new candidate genes for trait variation.
MAIZE (Zea mays L.) was domesticated from the Central America native Teosinte. It was then gradually adapted to temperate climates, up to the cool regions of America and then northern Europe. This acclimatization was made possible mainly by an adaptation of maize flowering time to the local climatic features. Flowering time and related traits such as plant height and total leaf number are determined mainly by the timing of the transition from vegetative to reproductive development made by the shoot apical meristem of maize (Irish and Nelson 1991). Use of molecular markers allowed the detection, since the late 1980s, of an increasing number of quantitative trait loci (QTL) controlling these traits. Besides studies addressing flowering time for its direct interest for maize adaptation to temperate climates (Ragot et al. 1995), this trait is also frequently scored as a component of yield (Mechin et al. 2001), drought stress (Veldboom and Lee 1996), or pest resistance (Bohn et al. 2000). A large body of QTL information is therefore presently available for flowering time in maize.
As opposed to other traits such as kernel characteristics, only few mutations affecting flowering time have been identified in maize, so that knowledge of the genetic control of this trait in maize remains relatively poor. The best known gene, INDETERMINATE1 (ID1), was cloned from a mutation where the apical vegetative meristem failed to be converted into a reproductive meristem (Colasanti et al. 1998). The ID1 gene encodes a zinc finger transcription factor. The locus id1 was mapped to chromosome 1L. Two other mutants, delayed flowering1 (dlf1) and leafy1 (lfy1), have shown specific albeit weak effect on the floral transition. Last, a recessive mutation of the EARLY PHASE CHANGE (EPC) gene reduced the duration of the juvenile vegetative phase, thus causing an early flowering (Vega et al. 2002). Still, no dramatic effect on the number of leaves was observed. The epc mutation was mapped on chromosome 8.
On the other hand, the genetics and molecular biology of the floral transition have been most extensively studied in Arabidopsis thaliana. Almost 80 genes involved in the timing of flowering are cloned and described for this species. Genetic, molecular, and physiological analyses led to the elaboration of a model of the genetic interactions between these genes (Koornneef et al. 1998; Blazquez 2000). Four genetic signaling pathways that promote flowering have been identified: the photoperiodic, autonomous, vernalization (transient exposure to low temperatures soon after germination), and gibberellins (GA) pathways. Briefly, photoreceptors, such as phytochromes (PHYA–E) and cryptochromes (CRY1–2), are involved in the perception of day length and interact with an endogenous circadian clock to initiate flowering signals under long days (Millar 2004). Then these signals are transducted and integrated by the CONSTANS (CO) gene (Suarez-Lopez et al. 2001). The autonomous and vernalization pathways accelerate flowering due to the reduction of the expression of FLOWERING LOCUS C (FLC), which encodes a floral repressor (Michaels and Amasino 2001). Mutations affecting GA synthesis delay flowering under both long and short days, but they have their strongest effect under short days (Ogas 1998). Signaling pathways are integrated by meristem identity genes, such as LEAFY (LFY; Simon et al. 1996).
Rapid progress in the knowledge of the signaling pathways in Arabidopsis has provided relevant insights into the genetic control of maize flowering time with reverse genetic approaches. Different phytochromes (PHYA, PHYB, and PHYC), the light receptors of the photoperiod pathway in Arabidopsis, have been isolated in maize (Christensen and Quail 1989; Dehesh et al. 1991; Basu et al. 2000). The maize gene orthologous to the Arabidopsis LUMINIDEPENDENS gene from the autonomous pathway (ZmLD) was also isolated (Nocker et al. 2000). The ZmLD locus was mapped on maize chromosome 3 but no information on its function is available to our knowledge, from either mutation or transgene experiments in maize. It can be noted that the ID1 gene has been assigned to this autonomous pathway by analogy to the Arabidopsis model (McSteen et al. 2000). Two maize genes involved in the gibberellin pathway have been cloned in maize. The ANTHER EARS1 (AN1) gene is involved in the synthesis of ent-kaurene (Bensen et al. 1995), the first tetracyclic intermediate in the gibberellin biosynthetic pathway. The dwarf8 mutation is an ortholog of the wheat Reduced height mutations (Rht-B1 and Rht-D1), which were used to develop the new semidwarf varieties of the “green revolution” (Peng et al. 1999). The AN1 and DWARF8 genes are orthologous to the Arabidopsis GIBBERELLIN REQUIRING1 (GA1) gene and Arabidopsis GIBBERELLIN INSENSITIVE (GAI) gene, respectively. Both have been mapped on maize chromosome 1L. The mutations causing the loss of AN1 or DWARF8 function lead to a reduction of the plant height and delay flowering. Finally, using reverse genetics, Bomblies et al. (2003) showed that two duplicate LFY homologs in maize, ZFL1 and ZFL2, play roles in floral organ identity, floral transition, and inflorescence phyllotaxy. These genes have been mapped, respectively, on chromosomes 2 and 10.
The presence of several maize genes orthologous of Arabidopsis flowering-time genes suggested that pathways promoting flowering time in Arabidopsis are conserved in maize. However, Arabidopsis and maize do not have the same floral biology. Arabidopsis is classified as a facultative long-day plant, whereas maize is originally a short-day one. Arabidopsis shows a vernalization response whereas low temperatures block the development of maize and can have deleterious effects. Better-preserved flowering time mechanisms are therefore expected with species closer than Arabidopsis. Comparative genetics suggested that the regulation of flowering should be conserved in the grass family Poaceae (Lin et al. 1995; Laurie 1997). Moreover, comparative mapping of cereals using restriction fragment length polymorphism (RFLP) has shown considerable marker-order conservation. Chromosomes of barley, wheat, and maize can be described in terms of rice “linkage segments” (Bennetzen and Laurie 1993; Devos and Gale 1997; Wilson et al. 1999). As a model for other grasses, rice (Oryza sativa L.) has the advantage of having a well-documented flowering process. First, ∼50 orthologous sequences of Arabidopsis flowering-time genes into the rice genome were identified by comparison of complete genome sequences between rice and Arabidopsis (Izawa et al. 2003). Second, since 1991, four major QTL or mutations involved in the photoperiod response have been cloned in rice: Heading date1 (Hd1), Hd3a, Hd6, and se5. Two major cloned QTL, Hd1 and Hd3a, are orthologous, respectively, with CO and FT, both involved in the photoperiod response in Arabidopsis (Yano et al. 2000; Kojima et al. 2002). Hd6 encodes the α-subunit of protein kinase CK2 (CK2α; Takahashi et al. 2001). In Arabidopsis, CK2α seems to interact with proteins involved in the circadian clock. Se5 encodes a putative heme oxygenase that is involved in phytochrome chromophore biosynthesis, suggesting that phytochromes are also involved in the flowering control as in Arabidopsis (Izawa et al. 2000). The floral functions of these last two genes were initially unknown in Arabidopsis and were discovered with fine-mapping QTL studies in rice. Despite a large interest in these rice genes, no published study has yet been performed to our knowledge to take advantage of them to find new candidate genes for QTL in maize.
For this study, we collected numerous QTL results in maize, from publications mentioned above and internal programs, and projected these on the same reference map. A new “overview” statistic has been developed to highlight “hot spots” of flowering-time QTL. We then used the meta-analysis approach of Goffinet and Gerber (2000) to estimate, for each linkage group, the number of QTL underlying the results that we synthesized and estimate their consensus positions. These were compared to the few positions of candidate genes for flowering time presently available in maize. Finally, we analyzed the conservation of synteny between rice and maize to predict the position of additional candidate genes in maize.
MATERIALS AND METHODS
Bibliographic synthesis of QTL results:
By making a bibliographical review and using internal results, we collected data from 22 QTL studies relative to flowering-time traits: days to pollen shed (DPS), silking date (SD), and related traits: plant height (HT) and leaf number (LN). For each QTL study, we reported in Table 1 the names of the parental lines used and the size and type of plant populations (F2, backcross, or near-isogenic lines populations …). We define here as an “experiment” the QTL analysis of one population evaluated for a given trait in a given environment (a single location or the mean of several locations, depending on information available in the publications). Each QTL is characterized by its map position [most likely position and confidence interval (C.I.) around this position] and the proportion of phenotypic variance explained, R2. QTL allelic effects were not used for this analysis. When the confidence interval for QTL position was not available in the publication, a 5% confidence interval was estimated with the approach proposed by Darvasi and Soller (1997) as 1where R2 is the proportion of variance explained and N is the size of the population. According to the authors, expression (1) is appropriate for both backcross and F2 populations.
Results collected from QTL mapping experiments involve different genetic maps that share only a few common markers. QTL were projected on a reference map, using markers shared by QTL maps and this reference map, by means of a homothetic function. We used as a reference map the Génoplante intermated B73 × Mo17 (IBM) population map (Lee et al. 2002), developed by Falque et al. (2003). In brief, this reference map first involves a framework map of 237 RFLP and SSR markers chosen for unambiguous locus order. Most markers used for the QTL studies were genotyped on this population and mapped individually along with framework markers. All these markers were then located on the reference map on the basis of their relative distance to flanking framework markers, keeping distances between framework markers stable. We finally projected the most likely position of each QTL and left and right flanking ends of the confidence interval, with a homothetic function using common markers between the reference map and QTL maps (see Figure 1). In a few cases, these common markers displayed a discrepancy in order between the initial QTL map and the reference map. When possible, we discarded inverted markers from the projection process and used the next flanking markers. Otherwise, the QTL was not projected.
To quantify the contribution of a given region to trait variation, we calculated a statistic, called overview farther on, which estimates the probability that a given genome segment comprises a QTL in one of the considered experiments. Once the estimated position of QTLi was projected on the reference map, we considered that the true position of QTLi was normally distributed around the most likely location pi of the QTL, with a variance S2i: N (Visscher et al. 1996). In most of the experiments that we considered, the limits of the confidence interval (C.I.i) of QTL position were estimated as the positions where the LOD value decreased by 1 unit relative to that of the most likely position. This interval corresponds roughly to a 5% confidence interval (Lynch and Walsh 1998), so that S2i was estimated as
For each QTL of a given linkage group, we then calculated step by step (every 0.5 cM) the probability that the true position lies between positions x and x + 0.5. We then estimated the average probability that segment x, x + 0.5 comprises a QTL in an experiment as where nbqtl is the number of QTL and nbE is the total number of experiments. This parameter was plotted along the reference map. To highlight regions where the density shows a notable peak, we plotted on the same graphs the average value of the parameter. This average value is equivalent to the uniform probability that segment x, x + 0.5 comprises a QTL in an experiment, knowing the average number of QTL per experiment (nbQTL/nbE), without information on QTL positions,
We then used the meta-analysis approach of Goffinet and Gerber (2000) to estimate, for each linkage group, the number of QTL underlying the results that we synthesized. This approach provides decision rules based on a modified Akaike criterion to determine the number of “real” QTL that best fits the results on a given linkage group. It also groups the QTL detected in independent experiments in classes that correspond to the same QTL and finally provides a consensus estimation of QTL positions. Computations were performed using the BioMercator software (Arcade et al. 2004). For the moment, the method used in the software does not allow us to distinguish between models with more than four real QTL on the same linkage group. If the estimated number of real QTL is more than four, BioMercator declares that the most probable model is one with a number of real QTL equal to the number of analyzed QTL. We therefore used the Delete function of the software to select segments of a linkage group separated by regions with no QTL and applied the meta-analysis to these segments. Compared to the method initially proposed by Goffinet and Gerber (2000), the software includes an additional method to estimate the confidence interval of consensus QTL positions, where S2i is the variance of the most likely location of QTLi and K is the total number of QTL attributed to the consensus QTL.
Synteny conservation-based approach:
Genetic mapping and sequence data were obtained from TIGR (http://www.tigr.org/tdb/tgi/plant.shtml), MaizeGDB (http://www.maizegdb.org/), RiceGD (http://btn.genomics.org.cn:8080/rice/), and RGP (http://rgp.dna.affrc.go.jp/index.html) databases. Rice sequences likely corresponding to genes involved in flowering time were identified with the comparison of flowering pathways in rice and Arabidopsis, as done by Izawa et al. (2003). This study was made with the indica rice genome draft released by the Beijing Genomics Institute (Yu et al. 2002) whereas most of the genetic data were available only for the japonica rice genome draft released by the Rice Genome Project (Goff et al. 2002). We therefore subjected each selected indica rice sequence to a BLASTN (Altschul et al. 1997) analysis with the japonica rice BACs, to find putative orthologous sequences. Genetic localization of these sequences on the high-density rice genetic map constructed in the Rice Genome Project was obtained from http://rgp.dna.affrc.go.jp/publicdata/geneticmap2000/index.html.
To construct a comparative map between the maize and rice genomes, we compared maize and rice tentative consensus [TC; created by assembling expressed sequence tags (ESTs) into virtual transcripts] sequences from the TIGR database available in January 2003 (release 11.0). Multiple sequence comparisons of maize against rice and rice against maize were carried out using TBLASTX and sequences were considered as homologous if the E-value was <1e-20 and the sequence identity was >75% over a 150-bp minimum-length high-scoring segment pair (hsp). To avoid spurious associations between sequences that are identical only on a small common domain, a significant overlap between two sequences is defined by an overlap of at least 60% of their length. It can be noted that, in some cases, a sequence of a given species presented a high identity with two or more sequences of the other species. The maize TC sequences were then subjected to BLASTN analysis with mapped maize sequences whereas the rice TC sequences were subjected to TBLASTX analysis with the japonica rice BACs. Homologous TC sequences that fulfilled these two criteria were used as anchor points to connect the maize map and rice genome. The maize reference map used for this approach was initially the IBM2 neighbors (IBM2n) map (http://www.maizegdb.org), where several new marker locations were projected from other maps using a homothetic function. Relationships between maize and rice genomes were first investigated using a graphical representation of anchor points and corresponding links, which underlined blocks with conserved synteny (several links and conserved anchor marker order). We then quantified the magnitude of the conservation between the two genomes using the synteny probability statistic developed by Gaut (2002). In this approach, each anchor point marker in maize receives a score of 1, 1/2, or 0, depending on whether two, one, or zero among its two flanking anchor points are associated to the same chromosome in rice. The synteny probability was initially defined as the average of the marker scores over an entire maize chromosome. On the basis of the same idea, we calculated the local synteny probability of one maize chromosomal segment by computing the average score of the two anchor points that delimit the segment.
In cases where a strong synteny is observed, a maize segment likely contains homologs for most of the genes encompassed in the corresponding rice segment. We therefore predicted maize regions that likely contain the homologs of a given rice flowering-time gene as the interval between the two maize anchor points linked to those that flank this gene in rice. The synteny probability was estimated for all these segments to quantify the quality of the projection and only values ≥25% were considered. Finally, we projected rice flowering-time loci from the IBM2n map to the reference map used in the meta-analysis to search for associations between QTL and these new candidate genes. In this article, loci detected in maize by the synteny approach are referred as “os” with the gene name, whereas loci directly mapped in the maize population are referred as “zm” with the gene name. When possible, these synteny-based projections were compared with the positions of orthologous genes mapped by RFLP in maize.
Simulations were performed to check that the congruency between QTL and projected regions was higher than expected by chance (global null hypothesis that the choice of candidate genes and/or their projection was not relevant). To take into account the fact that several loci were projected, we computed the average of the overview statistic over the projected segments and compared it to its corresponding distribution obtained over 10,000 random assignments of segment positions over the maize genome. This distribution was used to estimate the probability that an average overview equal or superior to that observed could have been obtained by chance. We also investigated the possibility of evaluating the proportion of associations between projected segments and QTL that could be due to chance only. To do so, we applied to our data recent approaches devoted to the analysis of the statistical significance of genome-wide studies (Delvin et al. 2003; Storey and Tibshirani 2003). Briefly summarized, these articles consider that, rather than individual P-values of a series of tests, the matter of concern is the proportion of tests considered as significant that are expected to be “false positives” (i.e., truly corresponding to the null hypothesis). This was addressed originally by taking into account the number of tests that are expected to reach a given P-value by chance if all tests performed correspond to the null hypothesis [false discovery rate (FDR) approach proposed by Benjamini and Hochberg 1995; described in Delvin et al. 2003]. It was shown later that this provides an overestimation of the proportion of false positives and that a more appropriate estimation should take into account the fact that only part of the tests that are performed correspond to the null hypothesis (Storey and Tibshirani 2003). To apply these approaches, we computed for each of the 30 projected segments (i) the average of the overview statistic along the segment and (ii) the same value for 10,000 random assignments of a segment with the same size over the maize genome. We used this distribution to estimate the proportion (P) of random segments of a given size that have an average overview statistic equal to or higher than that observed for the segment of interest. P-values of the projected segments were used to estimate the corresponding FDR and the Q-values as proposed by Storey and Tibshirani (2003). Q-values were computed using the QVALUE software (Dabney and Storey 2003). One important aspect of the use of this program is to evaluate the global proportion of the total tests that are conducted that correspond to the true null hypothesis (π0). This parameter is related to a tuning parameter (λ). The default setting of the program adjusts a curve (smoother) to the relationship between π0 and λ, to estimate π0 for λ = 1. As recommended by the authors, we looked at the relationship between π0 and λ to evaluate the stability of the results.
A total of 313 QTL from 22 studies (67 experiments) have been projected onto the reference map (see supplemental data at http://www.genetics.org/supplemental/Supplemental1 and Supplemental2 for reference genetic map and QTL positions). It can be noted that the average number of QTL per experiment (4.6) is close to that of 4 estimated by Kearsey and Farquhar (1998) over a very wide range of plant species and traits. All regions of the map, except 4S and 7S, contain QTL involved in the variation of flowering time, although the distribution of QTL varies considerably among genomic regions. For example, the number of QTL confidence intervals that encompass a given position varies between 0 and 10 on chromosome 1.
Overview of QTL in maize flowering time:
Figure 2 shows the results of the overview statistic of QTL repartition along the maize genome. The density curve exceeds 69 times the average value, suggesting that several tens of regions are involved in the variation of the maize flowering time. It can be noted that the curve shows close peaks on chromosomes 1, 2, 3, and 6. Six regions displayed particularly high values (determined empirically as 5 times the average value of the curve): one for chromosomes 1 and 9 and two for chromosomes 8 and 10. For all these regions, a QTL was detected with a major effect (i.e., high R2 value) in at least one experiment, and several other QTL were detected in other experiments.
By using meta-analysis, the 313 initially detected QTL were reduced into 62 consensus loci (see shaded rectangles in Figure 2). Consistent with the overview method, these loci appear widely distributed along the maize genome. Meta-analysis is a powerful tool to estimate if one, two, or possibly more real QTL are in regions that displayed complex patterns of the overview curve. For instance, Figure 3A presents a meta-analysis of the hot-spot region on chromosome 1. The overview curve of the region showed three peaks higher than the average value, out of which two were close (∼7 cM) and mostly corresponded to two distinct experiments (Koester et al. 1993; Rebaï et al. 1997). Finally, meta-analysis concluded to a model with two QTL.
As discussed by Goffinet and Gerber (2000), QTL meta-analysis makes it possible to estimate the number of real QTL underlying results obtained in independent experiments and estimate the consensus positions of these QTL. Figure 3, A and B, also shows that the confidence interval for consensus positions is smaller than those of corresponding initial QTL positions. We compared, at each major consensus position, the new confidence interval with that of the initial QTL that was located with the highest precision. We observed a decrease from 13.1 to 7.1 cM on chromosome 1, from 20 to 6.9 cM and from 5 to 3.6 cM on chromosome 8, from 7.6 to 5.8 cM on chromosome 9, and from 11.7 to 5.7 cM and from 15.9 to 9.9 cM on chromosome 10. On average for these loci, use of meta-analysis decreases the size of confidence intervals by a factor of 1.8 and therefore increases the precision of QTL mapping, which facilitates the identification of relevant candidate genes.
Meta-analysis also makes it possible to identify consensus QTL that are associated with the variation of several traits (here silking date, date of pollen shed, leaf number, and plant height). Floral transition affects male and female flowering time, leaf number, and correlatively plant height (Irish and Nelson 1991). QTL affecting the four traits are therefore likely involved in the timing of floral transition. Five such QTL were detected using meta-analysis: on chromosome 1 (loci near umc67 and umc1833), on chromosome 5 (locus near bnl5-71a), on chromosome 8 (vgt1, major QTL near umc1316), and on chromosome 9 (locus near csu147). Loci on chromosomes 1 (umc67), 8 (vgt1), and 9 (csu147) correspond to three of the six hot-spot loci revealed by the overview statistic. Several experiments suggest that the vgt1 QTL of chromosome 8 is involved in floral transition (Vladutu et al. 1999; Salvi et al. 2002). On the other hand, two clusters put together QTL that affect principally vegetative traits (plant height and leaf number; see details in Figure 3B). On chromosome 6, a locus close to bnlg1136 is associated with two QTL for plant height and one QTL for leaf number. On chromosome 7, a locus close to phi034 is mainly associated with plant height (seven QTL for plant height and three QTL for flowering time). These effects suggest the presence of genes involved in the control of vegetative growth, a mechanism independent of the floral transition in maize.
From the review of Izawa et al. (2003), we determined 37 indica rice genes involved in flowering time and mapped these sequences in silico on the japonica rice genome (see Figure 4). We added to these positions the locations of the 3 rice TFL1-like genes (RCN1/FRD2, RCN2, and RCN3/FRD1), that of the rice orthologous genes of Arabidopsis GAI and GA1, that of the loci of se5 mutation, and that of the Hd6 major QTL. Only 1 indica rice gene, a FIE-like (RiceDB accession no. contig88495), did not present a high identity with the japonica BAC sequences available in September 2003. This can be explained either by the absence of this gene in the japonica rice subspecies or by a gap in present sequence data. This yielded a total of 44 loci, out of which 3 corresponded to QTL cloned in rice (Yano et al. 2000; Takahashi et al. 2001; Kojima et al. 2002). Map positions for the circadian clock genes (osTOC1, osPRR37, osPRR73, osPRR59, and osPRR95) are consistent with those found by Murakami et al. (2003). Several flowering genes were mapped in duplicated genomic segments: FT-like9 and FT-like10 genes of chromosomes 1 and 5; the FT-like5, FT-like6, CRY12-L1, and CRY12-L2 genes of chromosomes 2 and 4; the ZTL/LHP2-like1 and ZTL/LHP2-like2 genes of chromosomes 2 and 6; and the TFL1-like genes (RCN1 and RCN3) of chromosomes 11 and 12. This is consistent with recent results showing that rice was an ancient aneuploid (Paterson et al. 2003; Vandepoele et al. 2003).
From the 36,022 maize TC sequences and the 51,569 rice TC sequences available, the reciprocal BLAST analysis allowed us to identify associations involving 11,469 maize TC sequences and 15,104 rice TC sequences. Finally, 2232 associations corresponded to genes mapped in both rice and maize. Every locus of one species that was connected to more than two loci in the other species was discarded, to limit the risk of connecting homologous members of complex gene families that do not correspond to strict orthology relationships. Finally, comparative mapping between rice and maize yielded 642 links between the two genomes. An overview of all relationships between the two genomes is accessible through the supplemental data (http://www.genetics.org/supplemental/, Supplemental3). Note that the 558 maize positions correspond to 339 public data for which the corresponding TIGR sequence name is indicated and 219 Genoplante data (<40%) for which the corresponding TIGR sequence name is coded. These last data were not specific to flowering time and are used here to better document genome relationships. A conservation of gene content and gene order between rice and maize chromosomes appears in several regions. For example, Figure 5 shows the relationships that were found between maize chromosome 9 and rice chromosomes 3 and 6. Maize chromosome 9 shares two conserved regions with rice chromosome 6 and one with rice chromosome 3. Such relationships could be found for many other regions, with the exception of a few regions of maize chromosome 2 and rice chromosomes 9, 11, and 12, for which limited sequence information was available. With Gaut's synteny probability (Gaut 2002), we estimated that 45.8% of the maize genome is related to at least one rice segment. For these regions, the conservation of the synteny makes it possible to predict the position of genes yet unknown in maize from the position of homologous genes in rice.
Only rice loci with two flanking anchor points connected to a clear syntenic region of maize were projected on the maize genetic map. Twenty-one sequences could therefore be projected on the maize genome, which yielded 30 loci. Locations of these projected loci are shown in Table 2 and Figure 2. Note that, consistent with former results by Paterson et al. (1995) and Basu et al. (2000), who used also a synteny approach, our results predict that two copies of osPHYC might be found in maize on chromosomes 1 and 5, respectively, near the zmPHYA1 and zmPHYA2 loci. For a few genes, it was possible to compare the projection of rice loci with the position of corresponding loci already mapped by means of RFLP in maize. These positions were highly consistent for the osLD gene on chromosome 3, osLFY on chromosomes 2 and 10, and osPHYB loci on chromosomes 1 and 9. The rice orthologous gene of dwarf8, osGAI, was projected onto two maize regions. One has been mapped on chromosome 1 near the dwarf8 and zmPHYA1 loci while the other one has been mapped on chromosome 5 just near the zmPHYA2 locus. This limited comparison between predicted and observed positions illustrates that the projection approach used is globally highly relevant. A further step would be to have indicators to evaluate, a priori, the relevance of a projection. In Table 2, we presented three parameters that aim at this objective: the size of the projected segment in maize, the average Gaut's local synteny probability of the maize segment, and the number of maize markers in the segment that are connected to other rice chromosomes. For instance, the three parameters converge toward a low conservation of synteny for the osLHP loci mapped on chromosome 5. Oppositely, results indicate a very high confidence for the projections of osGI and osFTL2/FTL3 loci, mapped on chromosomes 3 and 9, respectively, and for which the three parameters were satisfactory.
Association of QTL with candidate genes:
Loci involved in flowering time defined by the meta-analysis method are associated with only two maize genes already known (see Figure 2). These genes correspond to both the maize LEAFY-like genes, ZFL1 and ZFL2. The association between the ZFL1 locus and the second hot-spot locus of chromosome 10 and the association of these loci with QTL affecting the four traits supports the assumption that ZFL1 is involved in the control of floral transition, as in Arabidopsis. No association appears between epc, id1, dwarf8, and An1 loci and QTL whereas several publications show that mutations in these three genes affect maize flowering time (Bensen et al. 1995; Colasanti et al. 1998; Peng et al. 1999). Likewise, the zmPHYA1, zmPHYA2, zmPHYB1, zmPHYB2, and zmLD loci are not associated with QTL. It must be noted that their role in the control of the flowering time in maize is still uncertain.
Simulations showed that the average of the overview statistic over the 30 projected segments had only a 1.85% probability to be exceeded by chance (under the global null hypothesis that the choice of candidate genes and/or their projection was not relevant). This indicates that at least part of the associations that are underlined do correspond to a true correspondence between projected candidate genes and consensus QTL. Nineteen projected segments correspond to maize QTL in the sense that projected segments and QTL overlap (see Figure 2). We estimated, for each of the 30 projected segments, the proportion (P) of random segments of the same size that have an average overview statistic equal to or higher than that observed. This proportion varied between 1.6 and 87.0%. It was on average 19.6% for the 19 segments that overlapped with QTL and 60.0% for the 11 that do not overlap. P-values were used to estimate corresponding FDR and Q-values. We observed that the QVALUE software provided stable results over a large range of the tuning parameter λ (between 0.25 and 0.7) and that these results (for λ = 0.7) were intermediate between those estimated with the default option of the program (lowest Q-values) and those estimated with the original false discovery rate approach (highest values, consistent with the known bias of this approach, see material and methods). The 19 segments that overlap with consensus QTL had an average FDR of 48% and average Q-values of 6.6% (default option of the program) and 24% (at λ = 0.7). This last result indicates that only approximately one-fourth of these associations are expected to be due to chance.
On chromosome 1, the osSOC1 locus is associated with QTL affecting the silking date. OsGI is mapped to chromosome 3 close to QTL influencing silking date, leaf number, and plant height traits. On chromosome 4, osCRY12_L1 and osCCA1 are associated with QTL affecting flowering time. On chromosome 5, osFT-like5, osTOC1, and osCRY12_L3 are together associated with the same QTL affecting flowering time. On chromosome 5, a QTL affecting silking date is associated with osLHP, and QTL affecting height and leaf number are associated with osPHYC. OsZTL/LKP2_L2 and osELF3_L2 correspond to the same QTL affecting flowering time on chromosome 6. Another osZTL/LKP2_L2 locus is associated with the hot-spot locus of chromosome 9 (near the csu147 marker). On the same chromosome, QTL affecting flowering time were associated with a region where are mapped osELF3_L2 and three major QTL cloned in rice, osCO (Hd1), osFT-like2 (Hd3a), and osHY1 (se5). QTL affecting plant height and flowering time were associated with the CCA1/LHY gene on chromosome 10. Although most of the maize phytochrome genes seemed not to be associated with QTL for flowering time, associations appeared between QTL and genes of circadian clock complex (ZTL, TOC1, CCA1, ELF3, …) or genes of integrative mechanisms (SOC1, FT-Like, LHP, …).
Synthesis of initial QTL results by means of overview and meta-analysis:
Flowering time plays a key role in the adaptation of maize to various environmental conditions. It is also relatively easy to score when compared to other physiological traits that require heavy experimentation (e.g., Reymond et al. 2003 for drought tolerance), so that several studies have addressed QTL mapping for this and related traits, in several tens of mapping populations. A very large body of QTL information is therefore presently available (313 documented here). Synthesizing this information provides a unique opportunity to understand the genetic variation of flowering time within a large range of diversity. It also raises several methodological questions that are addressed below.
First, only very few markers, if any, are common to all the genetic maps that were used for QTL mapping. Any global analysis therefore first necessitates the projection of QTL (most likely positions and confidence intervals) on the same genetic map. Projection was performed here on a map derived from the international maize reference map (IBM), completed for markers that were used in QTL mapping experiments (Falque et al. 2003). A similar objective was followed in the Maize Mapping Project (http://www.maizemap.org), where the IBM2 neighbors map was constructed by integrating loci from several maps. This approach allowed a first global visualization of regions of the genome that appear repetitively involved in flowering-time variation. Some positions were covered by up to 16 QTL confidence intervals, whereas 74% of the maize genome was not covered by any QTL confidence intervals.
We developed and calculated the overview statistic to highlight regions of the genome that have a key contribution to the variation of the trait of interest in the experiments that were synthesized. In a given region of the genome, this parameter increases with (i) the number of experiments for which the region displayed significant QTL effects, (ii) the proximity of QTL positions estimated in the different experiments, and (iii) the precision of QTL position estimation in individual experiments. As illustrated by Equation 1, this last precision increases with the contribution of a QTL to trait variation and the size of the mapping population that was used. This statistic was compared over the genome to that expected from the number of initial QTL per experiment, with no information on their positions. It exceeded this threshold for a total of 69 regions. These included six hot spots for which QTL were strongly involved in flowering time in several experiments. These regions should therefore have a particular importance in the control of this trait in maize.
We then used the meta-analysis statistical approach of Goffinet and Gerber (2000) to estimate the number of consensus QTL in each region. A cluster of QTL found in different experiments can indeed correspond to either (i) a single QTL with estimated positions of which vary due to experimental error or (ii) several linked QTL, the effects of which depend on the population. Meta-analysis makes it possible to test these alternative hypotheses. Tests concluded here to a total of 62 consensus QTL. Only 6 of 62 consensus QTL (<10%) had a significant effect in a single experiment (afterward referred to as singletons), whereas a given consensus QTL could display significant effects in up to 16 experiments. We can first anticipate from this distribution that new independent QTL mapping experiments for flowering time in maize are more likely to confirm regions of the genome already known to affect the trait than to lead to the discovery of new regions. Knowledge about regions that often affect flowering time in breeding material will be helpful for breeders to monitor the development of material adapted to specific environmental conditions, using marker-assisted selection.
The high variation in the number of experiments in which a QTL shows a significant effect can be due to both statistical and biological factors. In almost the totality of QTL experiments, the authors used stringent statistical criteria, making it unlikely that a significant proportion of singletons are false positives. Conversely, it is well established that this conservative approach associated with generally limited population sizes decreases the power of QTL detection. A large fraction of QTL that appear here as singletons may therefore have contributed to trait variation in other experiments but did not appear as significant. From a biological standpoint, differences may first be due to a difference in allelic variation at the QTL. A population with a parent that carries a rare allele with strong effect at a QTL will likely yield a singleton for this QTL. Other factors such as epistasis between a QTL and other QTL are also likely, despite the limited amount of information available on such effects. Analysis of these phenomena calls for further developments that would take into account not only positions in initial experiments but also parental effects and possibly reconsider individual genotyping and phenotyping data obtained in these experiments.
Finally, in addition to estimating a consensus number of QTL, meta-analysis makes it possible to take advantage of all existing information to refine the positions of QTL. For the six major positions discussed above, these new positions appeared on average 1.8 times more precise than the most precise position estimated in individual experiments. This gain in precision can be beneficial to the identification of candidate genes for QTL. The meta-analysis approach proposed by Goffinet and Gerber (2000) therefore appears as a highly promising strategy to propose a synthetic interpretation of results obtained in numerous experiments. It calls, however, for further statistical developments. It first can be noted that some QTL have a very high weight in meta-analysis results and that permutation tests may be useful to estimate the stability of the results. It also must be noted that the meta-analysis approach that was used relies on the assumption that all initial QTL positions were estimated independently (i.e., different populations or independent samples of individuals). Former analysis includes some cases where initial QTL for a region include several traits that were estimated for the same population. Such situations are in limited number and meta-analyses of QTL conducted for individual traits (data not shown) gave consistent results with those presented here. Indeed, we observed that the confidence interval of consensus QTL obtained with the global analysis had generally the same length as that of the most precise corresponding single-trait consensus QTL. However, nonindependence between QTL position estimations may be more important in other studies and specific approaches (not yet available to our knowledge) should be developed to take into account the effect of individual sampling and other factors on the covariance between QTL position estimations.
Associations between QTL and maize genes:
The 69 regions highlighted by the overview statistic were resolved into 62 consensus QTL by using meta-analysis. It can be noted that the order of magnitude of these numbers is comparable to that of the 80 genes known to be involved in flowering time in Arabidopsis (Blazquez 2000) and to present knowledge about 20 genes in rice (Yano and Sasaki 1997; Yano et al. 2001). To further understand the genetic basis of flowering-time variation, we first investigated the association between QTL and genes known to control this trait in maize. Our results showed only two associations, concerning the two LEAFY orthologous genes, ZFL1 and ZFL2. As reported by Bomblies et al. (2003), these associations also suggested that the two copies of LEAFY are both implicated in the control of floral transition in maize.
On the other hand, we observed no association between the dwarf8 locus and QTL. This was surprising because Thornsberry et al. (2001) showed that dwarf8 polymorphisms were associated with the quantitative variation of flowering time and plant height in a population of maize inbred lines. This observation can have several causes. First, the parental lines used in the QTL experiments may have the same or equivalent allelic forms of the dwarf8 gene, which does not allow the detection of any QTL. Second, despite polymorphism between parental lines, strong epistatic interactions within mapping populations may have diminished the effect of dwarf8. For this reason, this gene may have contributed to flowering-time variation in QTL mapping experiments but its effect was too mild to be significant. As for dwarf8, the id1 locus was not associated with QTL although the mutation strongly affects the floral transition. However, the id1 locus is very close to a region identified by meta-analysis.
It can be noted that the number of flowering-time mutants described in maize is low when compared to the total number of QTL determined in our review. However, ongoing work in several groups, by using a combination of reverse genetics, transgenic approaches, and association genetics, should lead to an increased number of cloned flowering-time genes during the next decade. Such efforts should lead to the discovery of candidate genes for QTL hot spots and regions of milder effects. Salvi et al. (2002) forecasts cloning soon, using a positional approach, the vgt1 locus, which corresponds to a major consensus QTL on chromosome 8. Danilevskaya et al. (2003) have suggested that an orthologous gene of FT was a candidate gene for locus vgt2, which corresponds to the second major consensus locus of chromosome 8.
Evaluation of synteny approach:
The relationships between maize and rice genomes reveal a high conservation of gene content and order. The results that we obtained by comparing maize and rice TC sequences document in more detail the relationships highlighted first with RFLP (Devos and Gale 1997; Wilson et al. 1999). The average synteny probability for the whole maize genome is 45.8%, which is comparable to Gaut's estimation of 52.5% made with RFLP (Gaut 2002). Numerous small rearrangements are visible in specific regions, consistent with results of the genomic approach of Dunford et al. (2002), where they decrease strongly the local probability of finding conserved genes between the two species. The estimate of synteny therefore presents a high variation between the chromosomes and variation along them, too (see Table 3 and Figure 5, respectively).
Using our comparative map, the projection of rice candidate genes has been carried out to research new candidate genes for maize flowering-time QTL, to compensate for the lack of flowering-time genes presently cloned in maize. Combination of meta-analysis results and the synteny approach provided 19 associations between QTL and candidate genes out of which 2 were already known from maize genetic mapping (ZFL1 and ZFL2 genes) and 17 are new. Simulations first showed that the congruency between QTL and projected regions was higher than that expected by chance (global null hypothesis that the choice of candidate genes and/or their projection was not relevant). Application of the Q-value approach to our data showed that ∼5 associations that were found may be due to chance, whereas the others are expected to be observed because of a true effect of the projected gene on flowering-time variation. This underlines that genes, the projection of which is associated with QTL in this study, can be considered as highly relevant for further studies such as cloning and mapping homologs in maize, verification of colocalization with finely mapped QTL, followed by validation experiments. Conversely, some maize QTL are not associated with any rice candidate gene, despite that synteny probability is high in the region. Such situations could be due (i) to the presence of local gaps in the rice genome sequence presently available, which does not allow us to find a candidate gene by homology with Arabidopsis gene sequences, (ii) to the existence of maize genes involved in flowering time that share no or very limited homology with any rice gene, or (iii) to the fact that the corresponding gene is specific to grasses and has not been identified in rice yet. This last hypothesis is supported by the fact that two rice flowering-time cloned genes (Hd6 and se5) do not seem to be essential for flowering in Arabidopsis (Izawa et al. 2000; Takahashi et al. 2001). These results illustrate the importance of studying flowering time in grasses, to identify specific genes involved in development. Finally, several regions of the maize genome present no synteny with the rice genome. It is then impossible to find a candidate gene from rice for these regions. In this case, it is necessary to develop maize-specific strategies, like searching maize EST sequences, which present high identity with rice and/or Arabidopsis candidate genes, and mapping the corresponding loci.
Toward a genetic model of flowering time in maize:
Projected rice gene locations and the implication of corresponding candidate genes in the control of maize flowering time deserve validation experiments to identify whether or not a given individual candidate gene truly affects flowering time in maize. Colocalization, and also absence of colocalization, which was found for different gene families, leads to different hypotheses for future studies. Associations were observed with genes of photoreceptors (PHYC and CRY1/2), genes of circadian clock (TOC1, ZTL, ELF3, and CCA1/LHY), and genes of the repressive and integrative systems (GI, CO, SOC1, FT-like, LHP, and LFY). These results support the idea that the photoperiod pathway is important in maize, consistent with its well-established role in rice and Arabidopsis. On the other hand, most of the phytochrome loci in maize are not associated with QTL for flowering time, suggesting that phytochromes have no contribution in the photoperiodic control of flowering. This is supported by results of Takano et al. (2001), which showed no significant difference for plant height and heading date between wild-type and phyA mutant rice plants. Analyses of Arabidopsis photoreceptor mutants also have revealed relatively small contributions of phytochromes in the photoperiodic control of flowering. PhyA mutants of Arabidopsis flower later than the wild type under long-day conditions with a far-red-enriched light source, while in addition to PHYB, light-stable phytochromes regulate flowering in response to light quality in Arabidopsis. In contrast, a recent study of CRY2, a blue-light receptor of Arabidopsis, shows that CRY2 plays an important role in the photoperiodic control of flowering (El-Assal et al. 2003). The analysis of CRY2 alleles in different genetic backgrounds suggests that CRY2 does not require functional CRY1 or PHYA but it needs the product of CO and GI genes to promote flowering. Comparing these analyses with our results provides an insight into the function of cryptochrome in the regulation of flowering induction in maize. Associations between QTL and the genes of circadian clock without association with the photoreceptor system suggest that the circadian clock in maize should affect flowering time independently of the photoperiod perception. Our hypothesis is that some of the QTL that are not associated with candidate genes could be explained by different members of the circadian clock genetic complex.
Finally, it can be noted that the conservation of a gene between species is not synonymous with a conservation of its function or its effect. For example, the control of the photoperiod response involves the same genes in Arabidopsis and in rice: the GI, CO, and FT genes. The activity of CO is reversed in rice compared with Arabidopsis (Hayama et al. 2003). It reveals that an important developmental process can use the same set of genes, but different regulation interactions. Comparison of such interactions in maize and rice should deserve consideration.
Overview and meta-analysis of QTL detected in a wide series of experiments are two complementary tools to identify regions of the maize genome and estimate the number of QTL involved in the control of flowering time in maize. Starting with 313 QTL, we could underline 69 regions by using the overview statistic and resolve these into 62 consensus QTL by using meta-analysis. This last approach also yielded an increased precision in the estimation of QTL positions. Future improvements of meta-analysis should take into account parental effects estimated in the initial QTL analysis. Parental lines common to different QTL experiments indeed make it possible to estimate the relative effects of several parental alleles, which is of key importance for applied breeding and also biological knowledge.
Thanks to the Maize Genome Sequencing Project (Chandler and Brendel 2002), an increasing amount of genomic sequences will be available for regions involved in the variation of traits of interest. This will facilitate to a large extent the identification of candidate genes for QTL. While we wait for this promising information, our results indicate that synteny conservation between rice and maize permits an efficient use of rice sequence information to identify candidate genes for maize QTL. The same approach should also lead to a good insight into the genetic control of flowering time in other grass species, such as wheat, sorghum, ryegrass, etc. Validation of gene projection is then possible by mapping within the species of interest, using EST or genomic sequences. Function validation in the species of interest can be achieved through the identification of mutants in mutated populations, the development of transgenic plants with rice or Arabidopsis genes, and with association studies between sequence polymorphism and phenotypic variation. Finally, comparative analysis of QTL between species suggests the existence of homologous QTL for plant height and maturity within the grass family (Lin et al. 1995). Increasing precision in synteny relationships between genomes may make it possible to achieve an interspecific meta-analysis of QTL after projection onto the same reference genome, to better document the genetic architecture of flowering time.
We are grateful to Bruno Goffinet and Jean-Baptiste Veyrieras for helpful statistical comments on the manuscript, to Guylaine Blanc for the analysis of QTL integrated into this study, to Jean-Pierre Martinant for helpful reading of the manuscript, and to anonymous reviewers for very helpful comments. This work was supported by the Génoplante program.
Communicating editor: A. H. Paterson
- Received June 16, 2004.
- Accepted August 19, 2004.
- Genetics Society of America