Genetics, Vol. 168, 1627-1638, November 2004, Copyright © 2004
doi:10.1534/genetics.104.029470

Haplotype Structure and Phenotypic Associations in the Chromosomal Regions Surrounding Two Arabidopsis thaliana Flowering Time Loci

* Molecular and Computational Biology Program, University of Southern California, Los Angeles, California 90089
{dagger} Department of Preventive Medicine, University of Southern California, Los Angeles, California 90089
{ddagger} Plant Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, California 92037
§ Max Planck Institute for Developmental Biology, D-72076 Tübingen, Germany

3 Corresponding author: Molecular and Computational Biology, University of Southern California, 835 W 37th St., SHS 172, Los Angeles, CA 90089-1340.
E-mail: magnus{at}usc.edu

Manuscript received March 30, 2004. Accepted for publication July 8, 2004.

ABSTRACT

The feasibility of using linkage disequilbrium (LD) to fine-map loci underlying natural variation in Arabidopsis thaliana was investigated by looking for associations between flowering time and marker polymorphism in the genomic regions containing two candidate genes, FRI and FLC, both of which are known to contribute to natural variation in flowering. A sample of 196 accessions was used, and polymorphism was assessed by sequencing a total of 17 roughly 500-bp fragments. Using a novel Bayesian algorithm based on haplotype similarity, we demonstrate that LD could have been used to fine-map the FRI gene to a roughly 30-kb region and to identify two common loss-of-function alleles. Interestingly, because of genetic heterogeneity, simple single-marker associations would not have been able to map FRI with nearly the same precision. No clear evidence for previously unknown alleles at either locus was found, but the effect of population structure in causing false positives was evident.


ADVANCES in sequencing and genotyping have made it feasible to map genes underlying natural variation by searching directly for marker-trait associations in populations of "unrelated" individuals, so-called linkage disequilibrium (LD) mapping (reviewed in NORDBORG and TAVARé 2002; WEISS and CLARK 2002; BOREVITZ and NORDBORG 2003; CARDON and ABECASIS 2003). Strategies for LD mapping are strongly influenced by the genomic extent of LD. In organisms like Drosophila melanogaster, where LD generally appears to decay within a few kilobases, a genome-wide scan for marker-trait association would require typing almost every polymorphism (except perhaps rare ones) and is therefore not practicable. On the other hand, it may be possible to use LD to identify actual functional polymorphisms within candidate regions (LONG et al. 1998; DE LUCA et al. 2003). This should be contrasted with species with more extensive LD, such as humans; here a genome-wide scan using a subset of markers may be a sensible proposal, but it may be difficult to decide which of several polymorphisms in LD with the trait is (or are) causative (INTERNATIONAL HAPMAP CONSORTIUM 2003).

Highly selfing species, like Arabidopsis thaliana, are in many ways uniquely suitable for LD mapping (NORDBORG 2000; NORDBORG et al. 2002). Selfing decreases heterozygosity, and this leads to extensive LD and haplotype structure. Furthermore, the availability of inbred lines means that it is possible to observe haplotypes directly. In organisms where inbred lines are not available, such as humans, haplotypes must be either inferred or obtained through difficult experimental techniques (e.g., mouse-human hybrid cells; see PATIL et al. 2001). Finally, perhaps the greatest advantage of inbred lines is that they are permanent. Once they have been genotyped, it is possible to measure many phenotypes, repeatedly.

HAGENBLAD and NORDBORG (2002) investigated haplotype structure in the chromosomal region surrounding the A. thaliana flowering time locus FRI and found that LD in the region decayed over ~250 kb. Their sample of 20 accessions contained three of each of two previously identified loss-of-function alleles, friCol and friLer, that occur in the accessions Col and Ler, respectively (JOHANSON et al. 2000). The friCol allele appeared to be associated with an unbroken haplotype extending well over 200 kb, whereas the haplotype associated with friLer showed clear evidence of recombination within 32 kb on one side. However, because the sample size was so small, it was not possible to draw any real conclusion about the feasibility of fine-mapping FRI using LD.

Here we report the results of an extension of the study of HAGENBLAD and NORDBORG (2002), using a subset of the same markers, but a larger sample of 196 accessions. In addition to FRI we also included markers surrounding another major determinant of flowering behavior in A. thaliana, namely FLC (MICHAELS and AMASINO 1999; SHELDON et al. 1999; GAZZANI et al. 2003; MICHAELS et al. 2003). Our main aim was to investigate the haplotype structure around these two candidate loci and look for associations with flowering time.


MATERIALS AND METHODS

Accessions and phenotypic measurements:

This article combines results from two separate flowering time studies. One was carried out at USC and used accessions from the Nottingham Arabidopsis Stock Center as well as personally collected ones (Table 1). The other was carried out at the Salk Institute and mainly used accessions obtained from the Arabidopsis Biological Resource Center at Ohio State University (Table 2).


View this table:
In this window
In a new window

 
TABLE 1

Origin and phenotype of USC accessions

 

View this table:
In this window
In a new window

 
TABLE 2

Origin and phenotype of Salk accessions

 
Flowering time was measured in days, using unvernalized plants under long-day conditions (16 hr light/8 hr dark). Plants at USC were grown at 18° in growth chambers with a mixture of fluorescent and incandescent bulbs. Plants at the Salk Institute were grown at 22° in chambers with a 3:1 mixture of Cool-white and Gro-Lux (Sylvania) fluorescent bulbs. In both experiments, plants were regularly rotated within and between shelves to minimize effects of environmental heterogeneity. Plants that did not flower within 200 days were counted as 200.

To be able to compare flowering times measured in different environments, we utilized the fact that the two studies had several accessions in common (see Tables 1 and 2). However, accessions that carry the same name are by no means guaranteed to be identical: in addition to the obvious risks of contamination, many identically named accessions represent different lines of descent from an original bulk collection. Our studies involved 29 putatively identical pairs of accessions: Ag-0, Col-0, Cvi, Edi-0, Est-0, Gd-1, Ge-0, Gr-1, Ka-0, Kas-1, Ler, Lm-2, Lz-0, Mir-0, Mr-0, Mt-0, Na-1, No-0, Oy-0, Pi-0, Rak-2, Shahdara, St-0, Stw-0, Ta-0, Ts-1, Wa-1, Wei-0, and Ws-0. Of these, 10 pairs (Gd-1, Ka-0, Kas-1, Mir-0, Mr-0, Rak-2, Shahdara, St-0, Ta-0, and Wei-0) were found to differ with respect to some polymorphic site in the regions surveyed, and four pairs (Ge-0, Lz-0, Na-1, and Pi-0) were found to differ also with respect to the known functional alleles at FRI. Figure 1 shows flowering time in the two studies for these three types of pairs. For accessions that appear to be genetically identical, the difference in flowering time between the two studies is well described by a linear function (y = 2.20 + 0.65x, R2 = 0.88). We used this function to transform the Salk Institute data. (Albeit somewhat crude, this suffices for our purposes. We note that the procedure is likely to decrease power to detect associations and is thus conservative with respect to the main conclusions of this article.) Figure 2 shows the distribution of flowering times for the two studies separately and for the merged data. Note that early flowering accessions carrying one of the two known loss-of-function alleles at FRI generally flowered earlier at USC than at the Salk Institute, in agreement with Figure 1. The mean flowering time for all accessions is much greater in the USC study, but this is because this study contained a large number of extremely late, mostly wild-collected accessions.



View larger version (12K):
In this window
In a new window
Download PPT slide
 
FIGURE 1.—

Flowering time (in days) at USC vs. flowering time at the Salk Institute for pairs of accession with the same name (see text). The line is standard least-squares regression.

 


View larger version (28K):
In this window
In a new window
Download PPT slide
 
FIGURE 2.—

The distribution of flowering times in the USC and Salk Institute studies. Accessions carrying one of the known FRI loss-of-function alleles are shown separately. From top to bottom, USC data, Salk data, and the merged data.

 

Sequencing and genotyping:

Genomic DNA was extracted using standard methods. A total of 17 short fragments (8 in the FRI and 9 in the FLC region) were amplified using PCR with primers designed from the reference Arabidopsis genome sequence (available on request). All fragments were sequenced in both directions on Beckman CEQ sequencers.

The friLer promoter deletion and 16-bp friCol deletion were scored using the PCR primers given by JOHANSON et al. (2000). In addition, we scored for the two reported insertions in FLC intron 1 (Tables 1 and 2; see GAZZANI et al. 2003; MICHAELS et al. 2003). The 1.2-kb insertion characteristic of Ler was scored using 5'-AAAGCCAGCGCTATCACTAAAC-3' and 5'-GAAAAGGCCACTGGAAACTATG-3'. The 4.2-kb insertion characteristic of Da (1)-12 was scored using 5'-TGTAACTTCAAGGGCAGAAAAC-3' and 5'-GAGGAAGCCAAACTCCAAATAC-3' and confirmed (except in the case of No-0) using 5'-ATCGTCTCGCTTTTTGCTGA-3' (which is inside the insertion) and 5'-GAGGAAGCCAAACTCCAAATAC-3'.

Basic polymorphism analysis:

The per-base-pair population mutation rate {theta} was estimated using the number of segregating sites. All results are shown with indel polymorphisms omitted and with identically named accessions included. Including indels or dropping possibly identical individuals had little effect on the results.

Association mapping:

Marker-trait associations were sought using several different methods. First, we tested each marker locus (SNP or indel; loci with more than two alleles were not used) individually, using the alleles as factors in a Kruskal-Wallis "non-parametric" ANOVA (e.g., SIEGEL and CASTELLAN 1988). The Kruskal-Wallis test relies only on the rank order of the observations, not the magnitude of the differences. Alleles with frequency lower than five were not used.

Second, we used the same test to look for associations between the phenotype and the haplotypes given by each sequenced fragment. Singleton alleles were ignored when constructing the haplotypes, and haplotypes with frequency less than five were ignored.

Third, we tried the clustering algorithm described in MOLITOR et al. (2003a)(b). The method uses Markov chain Monte Carlo techniques to explore the space of possible clusterings of the observed haplotypes. At each iteration, the clusters consist of haplotypes that are similar to each other in terms of shared length identical by state around a given location x, which is allowed to vary between clusters. Each cluster has a parameter representing the mean flowering time of the haplotypes that are assigned to it, and this is used to calculate the likelihood of the data for a given clustering. The number of clusters, as well as their associated mean flowering times, are parameters explored by the algorithm. At each iteration we output the entire clustering. It is far from obvious how to summarize this output in the most useful way. We chose to look for the location x associated with the k clusters whose mean phenotype deviated most strongly from the overall mean. Different values of k were tried, and we also experimented with restricting the search to clusters that deviated in the same (either negative or positive) direction. This corresponds to looking for early or late alleles. To visualize the results and help determine which individuals were likely to carry which allele, we used hierarchical clustering of haplotypes based on how frequently each pair co-occurred in the same "significant" cluster (e.g., one of the three most deviant clusters in the direction of early flowering). For further details see MOLITOR et al. (2003a)(b). To reduce computational complexity, the algorithm was run using the fragment haplotypes (as described above) as markers. This procedure throws away information about recombination and mutation within each fragment and is justified by the belief that recombination between fragments is likely to be much more important.


RESULTS

Basic pattern of polymorphism:

Figure 3 shows how the level of polymorphism and Tajima's D varies across the two regions. The results for the FRI region differ considerably from those obtained by HAGENBLAD and NORDBORG (2002); in particular D, which fluctuated wildly between highly positive and highly negative values in the former study, is now almost uniformly negative. Note that standard population genetics estimators are not expected to be particularly sensitive to sampling in an unstructured population. The large discrepancy observed here thus illustrates that sampling does matter in the presence of population structure (see also PTAK and PRZEWORSKI 2002). Overall, Tajima's D appears to be negative in this sample: it is impossible to evaluate how unusual the large positive values observed are in the absence of genome-wide data.



View larger version (21K):
In this window
In a new window
Download PPT slide
 
FIGURE 3.—

Estimates of {theta} and Tajima's D for the fragments in the FRI (left) and FLC (right) regions. Separate estimates of {theta} were made for nonsynonymous polymorphisms (solid lines) and synonymous and noncoding polymorphisms (dotted lines).

 

Pairwise marker-trait associations:

Figure 4A shows association between flowering time and individual markers in the FRI region. There is clearly very little indication of where in this region FRI might lie. Indeed, it is far from obvious that this region would stand out in a genome-wide screen. Although probabilities on the order of 10–5 for a chance occurrence might seem to make this finding "significant," it must be remembered that population structure can cause arbitrarily strong associations. Thus, the only way to determine whether a particular polymorphism is significant or not is to compare it to a very large number of other markers as a genomic negative control. This study does not have such a control (although we note that associations are only slightly stronger than those in the FLC region, see below).



View larger version (14K):
In this window
In a new window
Download PPT slide
 
FIGURE 4.—

Pairwise marker-trait associations around FRI, for all the data, and with genetic heterogeneity reduced in various ways (see text). FRI is located approximately at 270 kb.

 
This negative result might appear surprising in light of the fact that FRI clearly has a very large effect on the distribution of phenotypes (cf. Figure 2). However, things are not as bad as they seem at first. The reason for the lack of strong associations in the FRI region is genetic (allelic) heterogeneity. There are two major alleles at FRI, and they are associated with different markers. The dramatic effect of this on the power of LD mapping is best illustrated by artificially reducing the heterogeneity. Figure 4B shows the effect on LD when all early flowering accessions except those carrying friCol are removed from the sample. Despite a much smaller sample size, there are now strong associations (on the order of 10–9) across the entire region. There is still little indication of where in the region FRI might lie, but it seems highly plausible that the entire region would stand out in a genome-wide scan.

The reason there is no positional information about friCol is clear: as mentioned above, there is little evidence of recombination within this haplotype, and thus there is no decay of LD. The situation is quite different for friLer, as is shown in Figure 4C, where all early flowering individuals except those carrying this allele have been removed. Two markers very close to FRI are extremely strongly associated with the phenotype, but LD decays rapidly on either side. There is little doubt that it would have been possible to map FRI using a sample like this one. Of course we would not have such a sample unless we already knew the answer (and were able to reduce genetic heterogeneity). The point here is simply to demonstrate that the data do contain information about the location of FRI, but that this information is obscured by heterogeneity.

Figure 4D shows marker trait associations when the two known alleles have been removed. Although these two alleles explain only a fraction of the phenotypic variation (cf. Figure 2), there is no evidence for other alleles.

Figure 5, top, shows associations around FLC using the entire sample. There is no evidence for any association. To test whether association could have been masked by genetic heterogeneity (between loci this time), individuals carrying the known FRI alleles were removed. As shown in Figure 5, bottom, there are still no associations.



View larger version (17K):
In this window
In a new window
Download PPT slide
 
FIGURE 5.—

Pairwise marker-trait associations around FLC, for all the data, and with accessions carrying the two major alleles at FRI removed (cf. Figure 4). FLC is located approximately at 3176 kb.

 

Haplotype-trait associations:

Pairwise marker-trait association studies ignore the information that is present in associations between markers. Methods that utilize this information should have greater statistical power. Given the structure of our data, a natural way to take some of the LD information into account is to look for associations between the haplotypes defined by each sequenced fragment and the trait. However, as shown in Figure 6, this approach does not appear to improve localization in this case (although overall significance levels increase). The general pattern is similar to that in Figure 4, except that the leftmost fragment in the FRI region now appears to be associated with flowering time. A closer look at the pattern of polymorphism in this region suggests that this is a spurious correlation caused by population structure.



View larger version (31K):
In this window
In a new window
Download PPT slide
 
FIGURE 6.—

Associations between fragment haplotypes and flowering time around FRI and FLC. The four different curves are for all the data and for data with heterogeneity reduced as described in the text (cf. Figure 4).

 
Because LD is so extensive, it seemed likely that better results could be obtained by utilizing multifragment haplotypes. We used a modified version of the method of MOLITOR et al. (2003a)(b) to explore the space of all possible haplotype clusters (see MATERIALS AND METHODS). The method allowed us to search for different numbers of significant clusters (i.e., alleles) and also to search separately for alleles associated with early and late flowering.

As long as early alleles were included in the search, the method invariably suggested a location within 10–20 kb on either side of the true location of FRI (Figure 7). Closer inspection of the output reveals that the two known early alleles are responsible for this (Figure 8). In other words, FRI can easily be localized using this method, genetic heterogeneity notwithstanding.



View larger version (25K):
In this window
In a new window
Download PPT slide
 
FIGURE 7.—

Negative log of the ratio of the posterior likelihood of position to prior (Bayes' factor) for the FRI and FLC data using the algorithm of MOLITOR et al. (2003b). The algorithm was run separately looking for the three phenotypically most deviant clusters in the negative (i.e., early) and positive (i.e., late) directions. The two known early mutations in FRI are within 1 kb (on either side) of the fifth fragment (270 kb). FLC is located approximately at the seventh fragment (3176 kb).

 


View larger version (29K):
In this window
In a new window
Download PPT slide
 
FIGURE 8.—

The clusters of FRI haplotypes responsible for the "early" peak at positions 256 kb (left tree) and 269 kb (right tree) in Figure 7. Branches are colored according to the phenotype of the corresponding accession. Blue text denotes accessions with the friCol allele; red text denotes those with friLer. Asterisks mark accessions that carry the insertions in FLC intron 1: one asterisk denotes a 1.2-kb insertion and two asterisks denote a 4.2-kb insertion. Number sign denotes accessions from Central Asia (see text).

 
If the search is limited to late alleles, a location ~50 kb upstream of FRI is suggested instead (Figure 7). This result is largely due to clusters of Swedish accessions (Figure 9), and several lines of evidence suggest that it is a false positive due to population structure. First, a similar cluster appears in the FLC region (see below). Second, analysis of genome-wide polymorphism data has shown that Swedish accessions are quite distinct on a genome-wide level (M. NORDBORG, T. T. HU, Y. ISHINO, J. JHAVERI, C. TOOMAJIAN, H. ZHENG, E. BAKKER, P. CALABRESE, J. GLADSTONE, R. GOYAL, M. JAKOBSSON, S. KIM, Y. MOROZOV, B. PADHUKASAHASRAM, V. PLAGNOL, N. A. ROSENBERG, C. SHAH, J. WALL, J. WANG, K. ZHAO, T. KALBFLEISCH, V. SCHULTZ, M. KREITMAN, and J. BERGELSON, unpublished data). We return to the issue of false positives in the DISCUSSION.



View larger version (29K):
In this window
In a new window
Download PPT slide
 
FIGURE 9.—

The clusters of FRI haplotypes responsible for the "late" peak at positions 209 kb (left tree) and 235 kb (right tree) in Figure 7.

 
When applied to the FLC region, the spatial statistics algorithm finds a signal precisely at FLC (Figure 7), however the peak is again largely due to late-flowering Swedish accessions (Figure 10). While it is eminently possible that the extreme late-flowering phenotype of these Swedish accessions could be due to allelic variation at FLC, it is impossible to rule out false positives without independent data (e.g., from an experimental cross).



View larger version (29K):
In this window
In a new window
Download PPT slide
 
FIGURE 10.—

The "late" (left tree) and "early" (right tree) clusters of FLC haplotypes at position 3176 kb in Figure 7. Note that the peak at this position is exclusively due to the late clusters. Asterisks mark accessions that carry the insertions in FLC intron 1: one asterisk denotes a 1.2-kb insertion and two asterisks denote a 4.2-kb insertion. Number sign denotes accessions from Central Asia (see text).

 
Interestingly, the method did not find a signal of early alleles at FLC (Figure 7), despite the fact that our sample contains several known early flowering alleles at this locus (GAZZANI et al. 2003; MICHAELS et al. 2003). The algorithm did, however, identify several clusters at FLC that appear to correspond to early alleles (Figure 10). In particular, it clusters accessions carrying the 1.2- and 4.2-kb intron 1 insertions associated with early flowering in Ler and Da (1)-12, respectively. It also clusters a group of Central Asian accessions, at least two of which (Shahdara and Kondara) appear to carry early FLC alleles. Caution is in order because both clusters are also geographically distinct and could thus easily be false positives. However, we note that these accessions are not clustered nearly as well at FRI (Figure 8), suggesting that their clustering at FLC reflects more than simply population structure.


DISCUSSION

LD mapping in A. thaliana:

We have shown that FRI could have been fine-mapped using LD to a roughly 30 kb region by typing markers in a candidate region and that the two common loss-of-function alleles could also have been identified this way. We emphasize that we have not addressed whether it could have been mapped in a genome-wide LD scan: the "peaks" shown in Figure 7 would of course remain, but it is not clear how many other such peaks would have been found. In other words, we have not addressed the issue of significance. This is a widely recognized problem for genome-wide LD scans. First, the very large number of comparisons involved in genome-wide scans makes it difficult to employ standard statistical corrections for multiple comparisons without losing all power. Second, population structure can cause arbitrarily high false-positive rates. A possible solution to this problem is to type a large number of unlinked markers and either estimate the population structure and carry out testing within appropriate subgroups (PRITCHARD and ROSENBERG 1999; PRITCHARD et al. 2000) or estimate the false-positive rate and adjust significance levels appropriately (DEVLIN and ROEDER 1999).

These methods are powerful, but they are by no means certain to work. For example, the method of PRITCHARD et al. (2000) assumes that the population is a mixture of random-mating subgroups that are appropriate for testing. It is not clear how it would work in a continuously distributed population characterized by isolation-by-distance. Another example, highly relevant in this context, is that if the phenotype is very strongly correlated with the population structure, it may not be possible to eliminate all false positives without also eliminating almost all true positives. In our sample, all extremely late accessions are Swedish, and almost all Swedish accessions are extremely late.

An alternative approach is to obtain direct evidence of linkage by observing transmission through a pedigree. In human genetics, this approach is encapsulated in the famous "transmission-disequilibrium test," which utilizes parent-offspring trios to test simultaneously for linkage and linkage disequilibrium (EWENS and SPIELMAN 1995). In an organism like A. thaliana, it is natural to combine LD mapping with a simple experimental cross. Either a cross would be used to verify the results of LD mapping or LD mapping would be used to refine the results of a QTL mapping experiment. The advantage of this approach over the statistical methods mentioned above is that it is totally robust; the disadvantage is of course that extra work is required. It may be asked why one should bother with LD mapping at all if crosses are nonetheless required. The answer to this is that whereas it is relatively easy to carry out a cross to verify linkage or to map a QTL roughly, fine-mapping a QTL down to 30 kb using crosses can be both expensive and time consuming (especially when working with a phenotype such as late flowering).

Methods for LD mapping:

Our results demonstrate that haplotype-based methods that incorporate information from multiple linked markers can be considerably more powerful than methods that analyze markers one at a time, ignoring any information about LD between markers. It may well be that the advantage of haplotype-based methods is especially pronounced in the presence of genetic heterogeneity. As illustrated in Figure 4, single-marker analysis is unable to localize FRI within a 250-kb region because of allelic heterogeneity. In contrast, the haplotype-based algorithm of MOLITOR et al. (2003a)(b) pinpoints a 30-kb region (Figure 7) and identifies the two major loss-of-function alleles (Figure 8). We note that a few carriers of these alleles do cluster with the majority: this is partially due to close recombination events, but more often due to missing data. Missing data are also the main reason why putatively identical accessions do not always cluster together (e.g., Ler and LerK in Figure 10).

We chose the method of MOLITOR et al. (2003a)(b) because of its flexibility and ease of implementation. A number of other haplotype-based methods have been proposed (MCPEEK and STRAHS 1999; SERVICE et al. 1999; LAM et al. 2000; TOIVONEN et al. 2000; LIU et al. 2001; RANNALA and REEVE 2001; LU et al. 2003; MORRIS et al. 2003), and it seems likely that some of them would have performed as well.

Variation at FRI and FLC:

As shown in Figure 2, the two loss-of-function alleles friCol and friLer explain only a minor fraction of the variation for flowering time in our sample. Various results suggest that at least some of the remaining variation is due to additional alleles at FRI, as well as allelic variation at FLC (SANDA et al. 1997; LE CORRE et al. 2002; GAZZANI et al. 2003; MICHAELS et al. 2003). Nonetheless, we were not able to identify any additional alleles (at either locus) with certainty. We did find clusters of late accessions at both loci, but these seem most likely to be false positives due to population structure. There is little evidence for further early alleles at either locus (although the clusters observed at FLC are suggestive).

In general, the power of association mapping to identify a particular allele depends on how much of the phenotypic variation that allele explains (e.g., LONG and LANGLEY 1999; ZONDERVAN and CARDON 2004). This is a function of both the effect of the allele and its frequency. With a sample as small as the one used here, we should expect to be able to find only common alleles with large effects, such as the two FRI alleles identified by JOHANSON et al. (2000). Thus our results in no sense rule out there being other important alleles at FRI or FLC, but they do suggest that any such alleles are relatively rare (at least with the current sample of accessions). That this is the case for FRI has also been suggested by LE CORRE et al. (2002), who identified several rare FRI alleles with nonsense mutations or frameshifts.

Very few of the early flowering accessions studied by LE CORRE et al. (2002) contained either of the two FRI alleles identified by JOHANSON et al. (2000). The only way to reconcile this finding with the high frequency of these alleles in our sample is by taking geographic structure into account. It is clear from Tables 1 and 2 that friCol is essentially a German mutation, while friLer has wider distribution (probably with a focus in Central or Eastern Europe). We note that both the more limited geographic range and the more extensive LD are consistent with friCol being younger than friLer [as suggested by HAGENBLAD and NORDBORG (2002)]. However, since the sample of LE CORRE et al. (2002) consisted mostly of accessions from France it is less surprising that they picked up few of these alleles. As can be seen in Tables 1 and 2, most of the early flowering accessions in our sample that do not carry one of these alleles hail from regions other than Germany or Central Europe. Population structure is also apparent at FLC. For example, the 4.2-kb intron 1 insertion observed in Da (1)-12 by MICHAELS et al. (2003) appears to be present largely in Czech accessions.


ACKNOWLEDGEMENTS
We thank Justin Borevitz, Caroline Dean, and Julin Maloof for numerous discussions about this topic. M.N. thanks Svante Holm, Malte Jönsson, Jirina Relichová, and Torbjörn Säll for help collecting plants, Vijaya Rao for help growing them, and Yoko Ishino for help with data management. M.N. was supported by the James H. Zumberge Research and Innovation Fund and by a grant from the W. M. Keck Foundation. J.W. was supported by a National Science Foundation Graduate Research Fellowship. D.W. is supported by the National Institutes of Health and the Max Planck Society.


FOOTNOTES
Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos. AY781906, AY785055.

1 These authors contributed equally to this work. Back

2 Present address: Institute of Cell, Animal and Population Biology, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom. Back


LITERATURE CITED

BOREVITZ, J. O., and M. NORDBORG, 2003 The impact of genomics on the study of natural variation in Arabidopsis. Plant Physiol. 132: 718–725.[Free Full Text]

CARDON, L. R., and G. R. ABECASIS, 2003 Using haplotype blocks to map human complex trait loci. Trends Genet. 19: 135–140.[CrossRef][Medline]

DE LUCA, M., N. V. ROSHINA, G. L. GEIGER-THORNSBERRY, R. F. LYMAN, E. G. PASYUKOVA et al., 2003 Dopa decarboxylase (Ddc) affects variation in Drosophila longevity. Nat. Genet. 34: 429–433.[CrossRef][Medline]

DEVLIN, B., and K. ROEDER, 1999 Genomic control for association studies. Biometrics 55: 997–1004.[CrossRef][Medline]

EWENS, W. J., and R. S. SPIELMAN, 1995 The transmission/disequilibrium test: history, subdivision, and admixture. Am. J. Hum. Genet. 57: 455–464.[Medline]

GAZZANI, S., A. R. GENDALL, C. LISTER and C. DEAN, 2003 Analysis of the molecular basis of flowering time variation in Arabidopsis accessions. Plant Physiol. 132: 1107–1114.[Abstract/Free Full Text]

HAGENBLAD, J., and M. NORDBORG, 2002 Sequence variation and haplotype structure surrounding the flowering time locus FRI in Arabidopsis thaliana. Genetics 161: 289–298.[Abstract/Free Full Text]

INTERNATIONAL HAPMAP CONSORTIUM, 2003 The international hapmap project. Nature 426: 789–796.[CrossRef][Medline]

JOHANSON, U., J. WEST, C. LISTER, S. MICHAELS, R. AMASINO et al., 2000 Molecular analysis of FRIGIDA, a major determinant of natural variation in Arabidopsis flowering time. Science 290: 344–347.[Abstract/Free Full Text]

LAM, J. C., K. ROEDER and B. DEVLIN, 2000 Haplotype fine mapping by evolutionary trees. Am. J. Hum. Genet. 66: 659–673.[CrossRef][Medline]

LE CORRE, V., F. ROUX and X. REBOUD, 2002 DNA polymorphism at the FRIGIDA gene in Arabidopsis thaliana: extensive nonsynonymous variation is consistent with local selection for flowering time. Mol. Biol. Evol. 19: 1261–1271.[Abstract/Free Full Text]

LIU, J. S., C. SABATTI, J. TENG, B. J. B. KEATS and N. RISCH, 2001 Bayesian analysis of haplotypes for linkage disequilibrium mapping. Genome Res. 11: 1716–1724.[Abstract/Free Full Text]

LONG, A. D., and C. H. LANGLEY, 1999 The power of association studies to detect the contribution of candidate genetic loci to variation in complex traits. Genome Res. 9: 720–731.[Abstract/Free Full Text]

LONG, A. D., R. F. LYMAN, C. H. LANGLEY and T. F. C. MACKAY, 1998 Two sites in the Delta gene region contribute to naturally occurring variation in bristle number in Drosophila melanogaster. Genetics 149: 999–1017.[Abstract/Free Full Text]

LU, X., T. NIU and J. S. LIU, 2003 Haplotype information and linkage disequilibrium mapping for single nucleotide polymorphisms. Genome Res. 13: 2112–2117.[Abstract/Free Full Text]

MCPEEK, M. S., and A. STRAHS, 1999 Assessment of linkage disequilibrium by the decay of haplotype sharing, with application to fine-scale genetic mapping. Am. J. Hum. Genet. 65: 858–875.[CrossRef][Medline]

MICHAELS, S. D., and R. M. AMASINO, 1999 FLOWERING LOCUS C encodes a novel MADS domain protein that acts as a repressor of flowering. Plant Cell 11: 949–956.[Abstract/Free Full Text]

MICHAELS, S. D., Y. HE, K. C. SCORTECCI and R. M. AMASINO, 2003 Attenuation of FLOWERING LOCUS C activity as a mechanism for the evolution of summer-annual flowering behavior in Arabidopsis. Proc. Natl. Acad. Sci. USA 100: 10102–10107.[Abstract/Free Full Text]

MOLITOR, J., P. MARJORAM and D. THOMAS, 2003a Application of Bayesian spatial statistics methods to analysis of haplotypes effects and gene mapping. Genet. Epidemiol. 25: 95–105.[CrossRef][Medline]

MOLITOR, J., P. MARJORAM and D. THOMAS, 2003b Fine-scale mapping of disease genes with multiple mutations via spatial clustering techniques. Am. J. Hum. Genet. 73: 1368–1384.[CrossRef][Medline]

MORRIS, A. P., J. C. WHITTAKER, C.-F. XU, L. K. HOSKING and D. J. BALDING, 2003 Multipoint linkage-disequilibrium mapping narrows location interval and identifies mutation heterogeneity. Proc. Natl. Acad. Sci. USA 100: 13442–13446.[Abstract/Free Full Text]

NORDBORG, M., 2000 Linkage disequilibrium, gene trees, and selfing: an ancestral recombination graph with partial self-fertilization. Genetics 154: 923–929.[Abstract/Free Full Text]

NORDBORG, M., and S. TAVARé, 2002 Linkage disequilibrium: what history has to tell us. Trends Genet. 18: 83–90.[CrossRef][Medline]

NORDBORG, M., J. O. BOREVITZ, J. BERGELSON, C. C. BERRY, J. CHORY et al., 2002 The extent of linkage disequilibrium in Arabidopsis thaliana. Nat. Genet. 30: 190–193.[CrossRef][Medline]

PATIL, N., A. J. BERNO, D. A. HINDS, W. A. BARRETT, J. M. DOSHI et al., 2001 Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294: 1719–1723.[Abstract/Free Full Text]

PRITCHARD, J. K., and N. A. ROSENBERG, 1999 Use of unlinked genetic markers to detect population stratification in association studies. Am. J. Hum. Genet. 65: 220–228.[CrossRef][Medline]

PRITCHARD, J. K., M. STEPHENS, N. A. ROSENBERG and P. DONNELLY, 2000 Association mapping in structured populations. Am. J. Hum. Genet. 67: 170–181.[CrossRef][Medline]

PTAK, S. E., and M. PRZEWORSKI, 2002 Evidence for population growth in humans is confounded by fine-scale population structure. Trends Genet. 18: 559–563.[CrossRef][Medline]

RANNALA, B., and J. P. REEVE, 2001 High-resolution multipoint linkage-disequilibrium mapping in the context of the human genome sequence. Am. J. Hum. Genet. 69: 159–178.[CrossRef][Medline]

SANDA, S. L., M. JOHN and R. M. AMASINO, 1997 Analysis of flowering time in ecotypes of Arabidopsis thaliana. J. Hered. 88: 69–72.[Abstract/Free Full Text]

SERVICE, S. K., D. W. TEMPLE LANG, N. B. FREIMER and L. A. SANDKUIJL, 1999 Linkage-disequilibrium mapping of disease genes by reconstruction of ancestral haplotypes in founder populations. Am. J. Hum. Genet. 64: 1728–1738.[CrossRef][Medline]

SHELDON, C. C., J. E. BURN, P. P. PEREZ, J. METZGER, J. A. EDWARDS et al., 1999 The FLF MADS box gene: a repressor of flowering in Arabidopsis regulated by vernalization and methylation. Plant Cell 11: 445–458.[Abstract/Free Full Text]

SIEGEL, S., and N. J. CASTELLAN, JR., 1988 Nonparametric Statistics for the Behavioral Sciences, Vol. 2. McGraw-Hill, New York.

TOIVONEN, H. T. T., P. ONKAMO, K. VASKO, V. OLLIKAINEN, P. SEVON et al., 2000 Data mining applied to linkage disequilibrium mapping. Am. J. Hum. Genet. 67: 133–145.[CrossRef][Medline]

WEISS, K. M., and A. G. CLARK, 2002 Linkage disequilibrium and the mapping of complex human traits. Trends Genet. 18: 19–24.[CrossRef][Medline]

ZONDERVAN, K. T., and L. R. CARDON, 2004 The complex interplay among factors that influence allelic association. Nat. Rev. Genet. 5: 89–100.[Medline]




This article has been cited by other articles:


Home page
Poult. Sci.Home page
Z. P. Cao, S. Z. Wang, Q. G. Wang, Y. X. Wang, and H. Li
Association of Spot14{alpha} Gene Polymorphisms with Body Weight in the Chicken
Poult. Sci., September 1, 2007; 86(9): 1873 - 1880.
[Abstract] [Full Text] [PDF]


Home page
ANN BOT (LOND)Home page
C. Shindo, G. Bernasconi, and C. S. Hardtke
Natural Genetic Variation in Arabidopsis: Tools, Traits and Prospects for Evolutionary Ecology
Ann. Bot., June 1, 2007; 99(6): 1043 - 1054.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
I. M. Ehrenreich, P. A. Stafford, and M. D. Purugganan
The Genetic Architecture of Shoot Branching in Arabidopsis thaliana: A Comparative Assessment of Candidate Gene Associations vs. Quantitative Trait Locus Mapping
Genetics, June 1, 2007; 176(2): 1223 - 1236.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
S. Stracke, T. Presterl, N. Stein, D. Perovic, F. Ordon, and A. Graner
Effects of Introgression and Recombination on Haplotype Structure and Linkage Disequilibrium Surrounding a Locus Encoding Bymovirus Resistance in Barley
Genetics, February 1, 2007; 175(2): 805 - 817.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
S. C. Gonzalez-Martinez, N. C. Wheeler, E. Ersoz, C. D. Nelson, and D. B. Neale
Association Genetics in Pinus taeda L. I. Wood Property Traits
Genetics, January 1, 2007; 175(1): 399 - 409.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Bot.Home page
I. M. Ehrenreich and M. D. Purugganan
The molecular genetic basis of plant adaptation
Am. J. Botany, July 1, 2006; 93(7): 953 - 962.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
S. Kim, K. Zhao, R. Jiang, J. Molitor, J. O. Borevitz, M. Nordborg, and P. Marjoram
Association Mapping With Single-Feature Polymorphisms
Genetics, June 1, 2006; 173(2): 1125 - 1133.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. Jakobsson, J. Hagenblad, S. Tavare, T. Sall, C. Hallden, C. Lind-Hallden, and M. Nordborg
A Unique Recent Origin of the Allotetraploid Species Arabidopsis suecica: Evidence from Nuclear DNA Markers
Mol. Biol. Evol., June 1, 2006; 23(6): 1217 - 1231.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. E. El-Lithy, L. Bentsink, C. J. Hanhart, G. J. Ruys, D. Rovito, J. L. M. Broekhof, H. J. A. van der Poel, M. J. T. van Eijk, D. Vreugdenhil, and M. Koornneef
New Arabidopsis Recombinant Inbred Line Populations Genotyped Using SNPWave and Their Use for Mapping Flowering-Time Quantitative Trait Loci
Genetics, March 1, 2006; 172(3): 1867 - 1876.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
M. Bevan and S. Walsh
The Arabidopsis genome: A foundation for plant research
Genome Res., December 1, 2005; 15(12): 1632 - 1642.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
B. R. Thumma, M. F. Nolan, R. Evans, and G. F. Moran
Polymorphisms in Cinnamoyl CoA Reductase (CCR) Are Associated With Variation in Microfibril Angle in Eucalyptus spp.
Genetics, November 1, 2005; 171(3): 1257 - 1265.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
A. L. Caicedo and M. D. Purugganan
Comparative Plant Genomics. Frontiers and Prospects
Plant Physiology, June 1, 2005; 138(2): 545 - 547.
[Full Text] [PDF]


Home page
Plant Physiol.Home page
D. Weigel and M. Nordborg
Natural Variation in Arabidopsis. How Do We Find the Causal Genes?
Plant Physiology, June 1, 2005; 138(2): 567 - 568.
[Full Text] [PDF]


Home page
Plant Physiol.Home page
K. K. Shimizu and M. D. Purugganan
Evolutionary and Ecological Genomics of Arabidopsis
Plant Physiology, June 1, 2005; 138(2): 578 - 584.
[Full Text] [PDF]