help button home button Genetics PLANT CELL
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Hagenblad, J.
Right arrow Articles by Nordborg, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hagenblad, J.
Right arrow Articles by Nordborg, M.
Genetics, Vol. 161, 289-298, May 2002, Copyright © 2002

Sequence Variation and Haplotype Structure Surrounding the Flowering Time Locus FRI in Arabidopsis thaliana

Jenny Hagenblada and Magnus Nordborga
a Department of Genetics, Lund University, Sölvegatan 29, S-223 62 Lund, Sweden

Corresponding author: Magnus Nordborg, University of Southern California, 835 W. 37th St. SHS 172, Los Angeles, CA 90089-1340., magnus{at}usc.edu (E-mail)

Communicating editor: O. SAVOLAINEN


*  ABSTRACT
*TOP
*ABSTRACT
*METHODS
*RESULTS AND DISCUSSION
*LITERATURE CITED

Linkage disequilibrium in highly selfing organisms is expected to extend well beyond the scale of individual genes. The pattern of polymorphism in such species must thus be studied over a larger scale. We sequenced 14 short (0.5–1 kb) fragments from a 400-kb region surrounding the flowering time locus FRI in a sample of 20 accessions of Arabidopsis thaliana. The distribution of allele frequencies, as quantified by Tajima's D, varies considerably over the region and is incompatible with a standard neutral model. The region is characterized by extensive haplotype structure, with linkage disequilibrium decaying over 250 kb. In particular, recombination is evident within 35 kb of FRI in a haplotype associated with a functionally important allele. This suggests that A. thaliana may be highly suitable for linkage disequilibrium mapping.


VARIATION at a particular site in the genome is the result of mutations occurring along the branches of the genealogical tree relating the homologous copies of that site. Recombination allows different sites to have different genealogical trees. This makes it possible to gain insight into the stochastic process that gave rise to the trees. Without recombination, data would reflect a single realization of this process, making statistical inference a questionable project. In general, linked sites will have correlated trees, the strength of the correlation depending on the genetic distance between the loci. This correlation in genealogy between sites may be reflected in the pattern of variation at the sites, giving rise to linkage disequilibrium (e.g., NORDBORG and TAVARE 2002 Down).

Recombination is a powerful force in population genetics. The probability of a neutral mutation is typically estimated to be on the order of 10-8–10-9/bp/meiosis in eukaryotes (e.g., LI 1997 Down). Recombination probabilities are of the same order of magnitude: In humans, for example, 1 cM corresponds to ~1 Mb on average; i.e., the probability of recombination per base pair per meiosis must be on the order of 10-8 on average. It follows from basic population genetic theory that there will on average be as many recombination events in a sample of sequences as there are segregating sites (NORDBORG 2000 Down). Theory also predicts that most of these recombination events will be undetectable (HUDSON and KAPLAN 1985 Down). In particular, it is important to realize that almost all methods that exist for detecting recombination in sequence data are designed to detect recombination when it is rare (as in horizontal transfer between bacterial lineages, for example) and are almost powerless when faced with "normal" recombination (MAYNARD SMITH 1999 Down). For example, whereas low levels of recombination (compared to the level of polymorphism) lead to distinctive "runs" of polymorphic sites that are incompatible with a single genealogical tree, high levels of recombination destroy most such patterns. As a consequence, it becomes very difficult to determine whether incompatible polymorphisms are caused by recombination or mutational hot spots (TEMPLETON et al. 2000 Down).

In general, the more polymorphic sites per recombination event, the more information we have about the underlying genealogical trees and recombination events (HUDSON and KAPLAN 1985 Down). This is very important in the context of linkage disequilibrium mapping (LONG and LANGLEY 1999 Down). A simple way of increasing the ratio of polymorphism to recombination is to look at partially selfing species (NORDBORG 2000 Down). The effective rate of recombination in such species is much reduced, essentially because recombination does not matter unless it occurs in outcrossed (heterozygous) individuals. As a result, extensive linkage disequilibrium and (some) clearly detectable recombination events are expected.

We decided to study the pattern of linkage disequilibrium and recombination in the highly selfing Arabidopsis thaliana. Previous polymorphism surveys in this species have found the expected high degree of linkage disequilibrium as well as evidence for some recombination (e.g., HANFSTINGL et al. 1994 Down; INNAN et al. 1996 Down). However, because these studies focused on single genes, little has been known about the pattern in larger chromosomal regions. To fill this gap, we resequenced 14 short (0.5–1 kb) fragments from a 400-kb region on chromosome 4, in a sample of 20 accessions.

In addition to studying a larger chromosomal region, we were interested in the pattern of variation surrounding a polymorphic locus likely to be adaptively important. The pattern of variation in linked regions can reveal the history of selection on the alleles and can also be used for linkage disequilibrium mapping. A. thaliana is known to vary tremendously for flowering time (LAIBACH 1951 Down; NAPP-ZINN 1985 Down; KARLSSON et al. 1993 Down; NORDBORG and BERGELSON 1999 Down). Several studies indicated that much of this variation maps to two loci, FRI and FLC (e.g., SANDA et al. 1997 Down). We chose to investigate the pattern of variation surrounding FRI and therefore selected fragments spanning the chromosomal region known to contain this locus. The 20 accessions were chosen so as to have a wide range of phenotypes using the results of a greenhouse survey of flowering time (NORDBORG and BERGELSON 1999 Down). Since the initiation of the study, both FRI (JOHANSON et al. 2000 Down) and FLC (MICHAELS and AMASINO 1999 Down; SHELDON et al. 1999 Down) have been identified.


*  METHODS
*TOP
*ABSTRACT
*METHODS
*RESULTS AND DISCUSSION
*LITERATURE CITED

Sample:
The following accessions were included in the sample: Algutsrum, Col, Dem-4, Got-32, Kent, Köln, Kondara, Kz-9, Ler, Lisse, Lund, MT-0, NC-6, Pu-2-3, Pu-2-8, Rsch-4, Shakhdara, Tamm-46, Tsu-0, and Vimmerby. Further information about these accessions can be found in NORDBORG and BERGELSON 1999 Down. In addition, single individuals of A. (Cardaminopsis) arenosa, Cardamine hirsuta, Erophila verna, and Capsella bursa-pastoris were used as outgroups. Genomic DNA was extracted using standard protocols.

PCR amplification of fragments:
PCR primers were constructed by applying primer-selection software to suitably spaced intervals of the published A. thaliana genome sequence. Some of the fragments were chosen on the basis of preliminary results from other fragments. PCR conditions were optimized for each locus separately. PCR conditions and primer sequences are available upon request.

DNA sequencing:
PCR products were purified with the QIAGEN (Valencia, CA) QIAquick PCR purification kit and used as templates for cycle sequencing with the fluorescent Bigdye Terminator ready reaction kit. Sequencing was done on ABI automated sequencers.

Analysis:
All A. thaliana fragments were sequenced in both directions, and the results were aligned using "Sequencher" (http://www.genecodes.com) for base calling. Ends of fragments were trimmed so as to remove low-quality sequence. The resulting fragments are listed in Table 1, which also gives the putative gene content of each fragment on the basis of the TIGR annotation (http://www.tigr.org).


 
View this table:
[in this window]
[in a new window]

 
Table 1. Fragments sequenced

Sequences from different accessions were also aligned using "Sequencher," with additional adjustments by hand in case of longer insertion/deletion polymorphisms. Most indels were treated as single events in the analysis; however, a few fragments contained complex polymorphisms that evidently involved repeated insertions/deletions and substitutions, and these were treated as complex alleles or haplotypes. In the analyses below, loci with more than two alleles were left out (the effect of doing this is slight). Finally, singleton polymorphisms (i.e., those with frequency 1/20) were left out of all analyses of linkage disequilibrium. Note that some fragments contained only singletons.

To improve the power to detect recombination, the ancestral state of each polymorphic site was determined by comparison with A. arenosa, except in the case of one fragment that could not be amplified, where the other species were used instead. No attempt was made to obtain complete sequences from the outgroup species, as this would have required cloning of all fragments (A. arenosa is a tetraploid outcrosser, for example).

Simulations:
The data were compared to results of simulations using the ancestral recombination graph (as described in NORDBORG 2001 Down). Two sets of simulations were run. First, to study the properties of samples of short fragments, 10,000 samples of size n = 20 were generated for each of the parameter combinations shown in Table 3. Second, to exemplify the behavior of long-range linkage disequilibrium, five such samples were generated for each of the parameter combinations in Fig 6.



View larger version (33K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1. Estimates of {theta} per site for each fragment, shown as a function of the position of the fragments. The bottom graph shows the central region of the top graph in greater detail.



View larger version (24K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 2. Two different estimates of {theta} per site for each fragment. The estimates are based on synonymous and noncoding sites only.



View larger version (14K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 3. Estimates of Tajima's D for each fragment. All sites were included in the estimation.



View larger version (158K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 4. Each point represents a comparison between a pair of polymorphic sites: The point is black if the two sites are compatible with a tree topology by the four-gametes test (see text) and white if not. The axes represent the physical position of the sites (so that points on the diagonal correspond to comparisons of each site with itself). Because of the uneven spacing of the sites, the distance between the points is not proportional to true distance. The actual position of each site is shown at the bottom. Blue and red points correspond to black and white points, respectively, but refer to two functionally important mutations in FRI (see text): The one giving rise to friCol is located at ~269 kb; the one giving rise to friLer is located at ~268 kb.



View larger version (33K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 5. The decay of linkage disequilibrium between all pairs of diallelic loci in the region, shown as a function of the distance between the loci. Linkage disequilibrium was measured using Fisher's exact test (e.g., WEIR 1996 Down), and the plot shows the negative logarithm of the "significance" of the association. Because of the large number of comparisons, many comparisons are expected to have low P values by chance alone. This is taken into account in the bottom graph, which shows the cumulative frequency distribution of P values for each distance, with the expected frequency subtracted. For example, if all loci were unlinked, we would expect 1% of all comparisons to have -log P <= 2. The histogram shows that there is an excess of comparisons in this category for all but the most distantly linked sites.



View larger version (60K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 6. The decay of linkage disequilibrium in data sets simulated under different assumptions about the population recombination parameter {rho}. These histograms should be compared with the histogram in Fig 5. The axes are the same. Each histogram was calculated from a single realization of the ancestral recombination graph with mutation parameter {theta} = 50 and with the recombination parameter {rho} specified at the top of each column.


 
View this table:
[in this window]
[in a new window]

 
Table 2. The number of segregating sites in each fragment


 
View this table:
[in this window]
[in a new window]

 
Table 3. Simulation results illustrating the effects of the parameters {rho} and {theta} on having and observing recombination in short fragments


*  RESULTS AND DISCUSSION
*TOP
*ABSTRACT
*METHODS
*RESULTS AND DISCUSSION
*LITERATURE CITED

The 17 amplicons (14 contiguous fragments) were successfully amplified and sequenced in all accessions. Comparison of our Col sequence with that obtained by the Arabidopsis Genome Initiative (AGI) revealed one discrepancy in 8627 bp. This discrepancy was judged to be due to an error in our base calling. Similarly, three discrepancies were observed in 445 bp of overlap between contiguous amplicons (for each of the 20 accessions), suggesting an error rate of ~10-4.

As shown in Table 1, 12 of the 14 fragments turned out to be located in putative genes. Two of these covered single exons, while the remaining 10 covered a mixture of coding and noncoding sequence. The pattern of polymorphism in our data is consistent with the genome annotation: Even though the number of coding and noncoding bases examined were roughly equal, insertion/deletion (indel) polymorphisms were largely limited to noncoding regions (see Table 2). Out of a minimum of 22 such polymorphisms, only 3 were in coding regions. Of these, 2 were single-base-pair indels present in single individuals and could well represent genuine deleterious mutations. The remaining indel was an in-frame Asn repeat. Indels in noncoding regions, on the other hand, were frequently both long and highly polymorphic.

Levels of polymorphism:
The level of polymorphism in a sample of sequences can be quantified by W, Watterson's estimate of the neutral mutation parameter {theta} (WATTERSON 1975 Down). To allow comparison between different regions, we estimate {theta} per site. Furthermore, we consider noncoding regions separately from coding regions, and, in the latter, distinguish between synonymous and nonsynonymous sites on the basis of whether a change in the DNA sequence would lead to a change in the amino acid sequence (see, e.g., LI 1997 Down). In the present data, per site for both synonymous and noncoding sites, and for nonsynonymous sites. These values are consistent with previous estimates (see AGUADE 2001 Down). However, as shown in Fig 1, the estimates vary considerably between the fragments, both in terms of the relationship between the levels of polymorphism for different types of sites and in terms of the absolute levels. It is worth noting here that these estimates should be positively correlated. Under neutrality, W depends strongly on the total branch length of the underlying genealogical tree, and the total branch lengths of linked genealogies should be more similar than the total branch lengths of unlinked genealogies. Thus the variability is more notable. This is especially significant when comparing different types of sites in the same fragment, which should have very similar genealogies. Interestingly, W for synonymous and noncoding sites seems to be much more correlated than either is with W for nonsynonymous sites (Fig 1). This may reflect different levels of selective constraint in the latter.

Allele-frequency distribution:
Another important characteristic of the pattern of polymorphism is the distribution of allele frequencies. Variation in this distribution can be summarized by comparing Watterson's estimator of {theta}, W, with Tajima's estimator, T (TAJIMA 1983 Down). The latter is based on the average number of pairwise differences between sequences and is therefore very sensitive to the topology of the underlying genealogy, which is reflected in the distribution of allele frequencies. Fig 2 shows the behavior of these two estimators across the region. The difference between them can be quantified using Tajima's D statistic, which has expectation zero under the standard neutral model, but will tend to be negative if there are too many rare alleles and positive if there are too many common ones (TAJIMA 1989 Down). Since such deviations can be caused by selection, Tajima's D is often used in tests for selection.

Fig 3 shows how D varies across the studied region. Note that a few values are "significantly" different from zero by the criteria typically used in molecular evolution studies. More importantly, D fluctuates wildly from highly negative to highly positive values. The variance of D observed here is 1.96: Simulations (see METHODS) show that the probability of observing such a high variance among 14 independent realizations of D under the standard neutral model is on the order of 10-3. Thus, the variance would seem to be too large to be compatible with the standard neutral model. It should be noted that the effect of linkage on D is not known; however, it seems highly likely that D for linked fragments should again be positively correlated, thus increasing the significance of the deviation from the standard model.

What is the cause of this deviation? Since, as is discussed below, the sample includes alleles of FRI with extremely strong phenotypic effects, the region is a priori unlikely to have evolved neutrally. Nonetheless, we do not think it is warranted to conclude that selection on FRI is responsible for the observed pattern. First of all, the observed pattern of variation in D is not immediately suggestive of any particular selective scenario. Second, and more importantly, we do not know whether the observed pattern is in fact typical for the genome. It has become standard practice in population genetics to "test for selection" by calculating a summary statistic such as Tajima's D and finding that it deviates significantly from its neutral expectation (reviewed in KREITMAN 2000 Down). This approach assumes that most of the rest of the genome is in fact evolving according to a standard neutral model. Leaving aside the (important) issue of whether selection perhaps affects every site in the genome (GILLESPIE 2000 Down), a very serious problem with the standard neutral model is that is assumes that there is no population structure—an assumption that is almost certainly false for most species. Since it is well known that population structure can affect variation in ways that mimic selection (e.g., NORDBORG 2001 Down), it would clearly be prudent to reject neutrality for a particular locus by comparing it to the actual pattern of variation in the rest of the genome, rather than to an idealized distribution. For obvious reasons, this has hitherto not been possible; equally obviously, it soon will be.

Recombination and linkage disequilibrium:
A simple way to look for recombination is the "four-gamete" test (HUDSON and KAPLAN 1985 Down): If, for a pair of diallelic sites, all four possible haplotypes are present, then this is incompatible with unique mutations on a single genealogical tree: Either there must have been a repeat mutation at one of the sites or there must have been recombination between the sites. Because the probability of recombination increases with the physical distance between sites, recombination may give rise to a distinctive pattern where closely linked sites are compatible and more distantly linked sites are mostly incompatible. Multiple mutations should not normally give rise to such a pattern, since the probability of a mutation affecting one member of a pair loci should not depend on the distance between the loci (JAKOBSEN and EASTEAL 1996 Down).

As pointed out above, recombination is expected to generate a distinctive pattern only when it is rare relative to mutation; frequent recombination will wipe out all patterns. However, in a highly selfing species like A. thaliana, we would expect many recombination events to have left obvious traces. Fig 4 shows that this is indeed the case: Closely linked sites are usually compatible with each other (giving rise to blocks of compatible sites along the diagonal); more distant sites are often incompatible. Recombination between fragments (on a scale of 10–20 kb) appears to be the rule, rather than the exception. Furthermore, there is strong evidence for recombination within 2 of the 14 fragments. Other fragments harbor a small number of incompatible sites; these could be due to mutational hot spots as well as recombination.

Next we consider another reflection of recombination, namely the decay of linkage disequilibrium. The top graph of Fig 5 shows linkage disequilibrium between all pairs of polymorphic sites as a function of the distance between the sites. The bottom graph shows a histogram of the distribution of P values (under Fisher's exact test of association) for each distance interval. Linkage disequilibrium is extensive, but decays sharply with distance within the surveyed region (i.e., over 250 kb).

These data agree qualitatively with model predictions for selfing species (NORDBORG 2000 Down). First, the amount of linkage disequilibrium is considerably more extensive than what is typically observed in outcrossing species. For example, in Drosophila, linkage disequilibrium typically decays within a few kilobases, not 250 kb (LANGLEY et al. 2000 Down). Second, linkage disequilibrium is far from genome wide. A. thaliana has sometimes been treated as if it were an effectively clonal species, with accessions (or "ecotypes") evolving in a tree-like fashion. This is plainly not the case. Although linkage disequilibrium is extensive, it does decay. Heterozygous individuals are routinely observed in natural populations (E. A. STAHL, R. HUFFT, M. KREITMAN and J. BERGELSON, unpublished results), and there is no "phylogeny" of A. thaliana accessions (SHARBEL et al. 2000 Down). The decay of linkage disequilibrium observed here appears to be consistent with the genome-wide pattern (NORDBORG et al. 2002 Down).

What about quantitative agreement? The frequency of recombination in standard population genetics models is determined by the recombination parameter {rho}, which plays a role analogous to that of {theta} for mutation. Several methods for estimating {rho} from polymorphism data exist; however, they generally have poor statistical properties (WALL 2000 Down), and none are well suited for the present type of data (short sequences with little recombination separated by large regions with no information). We therefore use simulations (see METHODS) and summary statistics. Like all other existing estimation methods, our simulations assume no population structure or selection, assumptions that we know to be false. The results should therefore be interpreted with caution.

First we consider the decay of linkage disequilibrium. Five genealogical histories were generated for each of five values of {rho}, and the decay of linkage disequilibrium was plotted as in the bottom graph of Fig 5. The results are shown in Fig 6. Note, first, that there is tremendous variation between realizations, at least for the lower values of {rho}. The reason for this is simple: The lower the rate of recombination, the more correlated the genealogies, and the bigger that variance between realizations. Nonetheless, a comparison of these results with Fig 5 suggests that {rho} for the region studied is most likely ~40 and probably not >80. This suggests that {rho} {cong} 0.2/kb or 0.1–0.2/fragment. As discussed earlier, {theta} per fragment is estimated to be somewhere in the range of 1–3, depending on the length of the fragment and the proportion of coding to noncoding sequence. Are these parameters compatible with the pattern of linkage disequilibrium observed within fragments? In particular, are they compatible with the somewhat surprising finding that recombination seems to have occurred in 2 of 14 fragments? To investigate this, we simulated a large number of fragment data sets using a range of values for {rho} and {theta}. The results are shown in Table 3. Note first that whereas the probability of recombination having taken place in a sample depends only on {rho}, the probability of detecting this depends on {theta} as well. In general, the higher the value of {theta}, the greater the probability of detecting a recombination event; however, a substantial fraction can never be detected (HUDSON and KAPLAN 1985 Down).

It is clear from Table 3 that, even for {theta} = 7, the a priori probability of observing recombination in 2 of 14 segments is small unless {rho} = 0.5 or so. Thus {rho} = 0.1 would seem to be too low. If {rho} = 0.5 per fragment, then {rho} {cong} 200 for the region in Fig 5, which contradicts the conclusions drawn from Fig 6. PRITCHARD and PRZEWORSKI 2001 Down reported a similar discrepancy—too little linkage disequilibrium on a fine scale and too much on a larger scale—in humans. This phenomenon deserves further study as more data become available.

Another question is whether the amount of recombination relative to mutation is compatible with the high degree of selfing in A. thaliana. It makes sense to consider the ratio of {rho}/{theta} because

where u is the neutral mutation probability per meiosis and r is the recombination fraction. This avoids dealing directly with the coalescent scaling constant (the "effective population size," Ne). Both u and r can be estimated: u from sequence divergence between species and r using standard genetic methods (with the important caveat that the values of r that are of interest in population genetics are typically much smaller than can be directly estimated; see ANDOLFATTO and NORDBORG 1998 Down). In a partially selfing organism,

where F is the inbreeding coefficient (NORDBORG 2000 Down). Thus, as long as u and r are the same, {rho}/{theta} is a selfer should be (1 - F) of that in an outcrosser. If A. thaliana is 99% selfing, as has been suggested (ABBOT and GOMES 1989 Down; BERGELSON et al. 1998 Down), then F = 0.98 and (1 - F) = 2%. In the present data, {rho}/{theta} seems to be roughly 1/10, in agreement with previous studies (KAWABE and MIYASHITA 1999 Down) and not incompatible with what is known about this ratio in outcrossing organisms (NORDBORG et al. 2002 Down). However, if F >> 0.98, for instance, because of inbreeding resulting from population structure, then the ratio is too high, and the data may suggest that A. thaliana is, or has recently been, more outcrossing than is commonly believed.

FRI haplotypes:
We selected a sample that included both early- and late-flowering accessions, hoping that (a) FRI would turn out to be responsible for a large fraction of the phenotypic variation and (b) the sample would include a sufficiently large number of alleles of each type to address questions about the history of selection on the alleles, as well as about the feasibility of linkage disequilibrium mapping. Since the initiation of this study, FRI was identified, and it became possible to identify the FRI alleles present in our sample, which turned out to have the following composition (JOHANSON et al. 2000 Down):

  1. The early flowering Col, Köln, and Mt-0 share a FRI loss-of-function allele, friCol.

  2. The early flowering Ler, Rsch-4, and Tsu-0 share another FRI loss-of-function allele, friLer.

  3. The remaining 14 accessions appear to have functional FRI alleles. Two of these, Shakhdara and Kz-9, are also quite early (Kondara and Pu-2-8 are intermediate). Other loci are likely to be involved here; the early flowering of Shakhdara seems to be caused by a recessive allele at the unlinked locus FLC (S. GANESTAM, J. HAGENBLAD, V. S. RAO, T. KRAFT and M. NORDBORG, unpublished results; S. MICHAELS, personal communication).

In other words, FRI did turn out to be responsible for a large fraction of the phenotypic variation; however, no early flowering allele had a frequency higher than three, which severely reduces our power to draw meaningful conclusions about the history of the alleles.

The pattern of recombination associated with the two functionally important FRI alleles is highlighted in Fig 4. Take friCol first. On the telomeric side (left side in the coordinate system used), there is no evidence for recombination in 117 kb. On the centromeric side, a single incompatible site is found 32 kb from the mutation; however, this site is very likely a mutational hot spot (note that it is incompatible with all other sites). More plausible evidence for recombination is found at 108 kb. Thus friCol may be associated with a haplotype that is well over 200 kb long in our sample. We note that this is one of the longest haplotypes in the data. In fact, if we consider all 1040 possible choices of 3 out of 20 accessions, no other combination shares such a long haplotype. This suggests that the 3 sampled copies of friCol have an unusually recent common ancestor, perhaps as a result of positive directional selection. The relative recency of the friCol allelic class is also suggested by the fact that there is only a single segregating site within this class (in addition to the incompatible sites in Fig 4, which may or may not be part of the allelic class, depending on where recombination occurred).

Next consider friLer. Even if we conservatively ignore all isolated incompatible sites, there is clear evidence for recombination within 34 kb on the telomeric side and reasonable evidence for recombination within 109 kb on the other side. The haplotype associated with friLer would thus appear to be considerably shorter than the one associated with friCol. There is a single segregating site within this class in addition to the three isolated incompatible sites (this excludes the portions that have clearly undergone recombination).

A number of methods for estimating the ages of alleles exist; however, given the small amount of data in this study, such estimation is of questionable value. The picture is brighter if we turn from the uncertainties of historical inference to the more practical issue of linkage disequilibrium mapping. There is currently tremendous interest in using population association to map genes responsible for naturally occurring phenotypic information. The extent of linkage disequilibrium is very important in this context because it determines how dense a map is needed, on the one hand, and how finely loci may be mapped, on the other (ALTSHULER et al. 2000 Down). In outcrossing organisms such as Drosophila (LONG et al. 1998 Down) or maize (THORNSBERRY et al. 2001 Down), linkage disequilibrium appears to be restricted to a few kilobases, making it a realistic prospect that trait variation can be associated with particular sites, while at the same time making it unlikely that loci can ever be identified in genomic scans. In humans, the extent of linkage disequilibrium is subject to heated debate (e.g., KRUGLYAK 1999 Down; REICH et al. 2001 Down). The results presented here suggest that selfing organisms, like A. thaliana, rice, or barley, may be highly suitable for mapping by genomic scans for association. The presence of extensive, highly diverged haplotypes would seem to be ideal for modern inference methods (MORRIS et al. 2000 Down; TOIVONEN et al. 2000 Down), and the naturally inbred populations make it easy to obtain haplotype data as the complications of heterozygotes are avoided.

Finally, our data illustrate some of the limitations of linkage disequilibrium mapping rather well. First, whether a particular allele can be mapped or not depends on the history of that allele. History cannot be repeated. Thus, it is quite possible that the firCol allele will still be surrounded by a haplotype that is several hundred kilobases long even in a much larger sample. If so, linkage disequilibrium mapping would not be particularly useful for this allele, as it is relatively painless to map genes to this scale in A. thaliana using traditional methods (such as recombinant inbred lines). However, sometimes it will work, as illustrated by the friLer allele, where we found clear evidence for recombination within 34 kb in a sample of only three alleles. It is clear that a very large mapping population would be needed to achieve this kind of resolution using traditional methods.

Second, genetic heterogeneity is a very important issue for anyone interested in association mapping (WEISS and TERWILLIGER 2000 Down). Early flowering in our sample was due to alleles at multiple loci as well due to multiple alleles at a single locus. It will clearly not be possible to map an allele unless it is responsible for a reasonably large fraction of the variation in the sample. Sampling is likely to play an important role in this context: We note that in our sample, none of the early flowering accessions from Central Asia appear to carry early flowering alleles of FRI, whereas almost all other early flowering accessions do.


*  FOOTNOTES

Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos. AYO92417, AYO92756. Back


*  ACKNOWLEDGMENTS

We thank C. Dean, U. Johanson, and J. West for much help and for providing access to prepublication data without which this project would not have been possible. P. Arctander and H. Ellegren graciously let us use their sequencing machines. We thank H. Innan for comments on the manuscript. This work was supported by grants from the Swedish Natural Sciences Research Council and the Erik Philip-Sörensen Foundation to M.N.

Manuscript received August 17, 2001; Accepted for publication January 31, 2002.


*  LITERATURE CITED
*TOP
*ABSTRACT
*METHODS
*RESULTS AND DISCUSSION
*LITERATURE CITED

ABBOT, R. J. and M. F. GOMES, 1989  Population genetic structure and outcrossing rate of Arabidopsis thaliana (L.) Heynh. Heredity 62:411-418.

AGUADÉ, M., 2001  Nucleotide sequence variation at two genes of the phenylpropanoid pathway, the FAH1 and F3H genes, in Arabidopsis thaliana.. Mol. Biol. Evol. 18:1-9[Abstract/Free Full Text].

ALTSHULER, D., M. DALY, and L. KRUGLYAK, 2000  Guilt by association. Nat. Genet. 26:135-137[Medline].

ANDOLFATTO, P. and M. NORDBORG, 1998  The effect of gene conversion on intralocus associations. Genetics 148:1397-1399[Free Full Text].

BERGELSON, J., E. STAHL, S. DUDEK, and M. KREITMAN, 1998  Genetic variation within and among populations of Arabidopsis thaliana.. Genetics 148:1311-1323[Abstract/Free Full Text].

GILLESPIE, J. H., 2000  Genetic drift in an infinite population: the pseudohitchhiking model. Genetics 155:909-919[Abstract/Free Full Text].

HANFSTINGL, U., A. BERRY, E. KELLOGG, J. T. COSTA, III, and W. RÜDIGER et al., 1994  Haplotypic divergence coupled with lack of diversity at the Arabidopsis thaliana alcohol dehydrogenase locus: roles for balancing and directional selection. Genetics 138:811-828[Abstract].

HUDSON, R. R. and N. L. KAPLAN, 1985  Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111:147-164[Abstract/Free Full Text].

INNAN, H., F. TAJIMA, R. TERAUCHI, and N. T. MIYASHITA, 1996  Intragenic recombination in the Adh locus of the wild plant Arabidopsis thaliana.. Genetics 143:1761-1770[Abstract].

JAKOBSEN, I. B. and S. EASTEAL, 1996  A program for calculating and displaying compatibility matrices as an aid in determining reticulate evolution in molecular sequences. Comput. Appl. Biosci. 12:291-295[Abstract/Free Full Text].

JOHANSON, U., J. WEST, C. LISTER, S. MICHAELS, and R. AMASINO et al., 2000  Molecular analysis of FRIGIDA, a major determinant of natural variation in Arabidopsis flowering time. Science 209:344-347.

KARLSSON, B. H., G. R. SILLS, and J. NIENHUIS, 1993  Effects of photoperiod and vernalization on the number of leaves at flowering in 32 Arabidopsis thaliana (Brassicaceae) ecotypes. Am. J. Bot. 80:646-648.

KAWABE, A. and N. MIYASHITA, 1999  DNA variation in the basic chitinase locus (ChiB) region of the wild plant Arabidopsis thaliana.. Genetics 153:1445-1453[Abstract/Free Full Text].

KREITMAN, M., 2000  Methods to detect selection in populations with applications to the human. Annu. Rev. Genomics Hum. Genet. 1:539-559[Medline].

KRUGLYAK, L., 1999  Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat. Genet. 22:139-144[Medline].

LAIBACH, F., 1951  Über sommer- und winterannuelle Rassen von Arabidopsis thaliana (L.) Heynh. Ein Beitrag zur Ätiologie der Blütenbildung. Beitr. Biol. Pflanz. 28:173-210.

LANGLEY, C. H., B. P. LAZARRO, W. PHILLIPS, E. HEIKINNEN, and J. M. BRAVERMAN, 2000  Linkage disequilibria and the site frequency spectra in the su(s) and su(wa) regions of the Drosophila melanogaster X chromosome. Genetics 156:1837-1852[Abstract/Free Full Text].

LI, W.-H., 1997 Molecular Evolution. Sinauer Associates, Sunderland, MA.

LONG, A. D. and C. H. LANGLEY, 1999  The power of association studies to detect the contribution of candidate genetic loci to variation in complex traits. Genome Res. 9:720-731[Abstract/Free Full Text].

LONG, A. D., R. F. LYMAN, C. H. LANGLEY, and T. F. C. MACKAY, 1998  Two sites in the Delta gene region contribute to naturally occurring variation in bristle number in Drosophila melanogaster.. Genetics 149:999-1017[Abstract/Free Full Text].

MAYNARD SMITH, J., 1999  The detection and measurement of recombination from sequence data. Genetics 153:1021-1027[Abstract/Free Full Text].

MICHAELS, S. D. and R. M. AMASINO, 1999  FLOWERING LOCUS C encodes a novel MADS domain protein that acts as a repressor of flowering. Plant Cell 11:949-956[Abstract/Free Full Text].

MORRIS, A. P., J. C. WHITTAKER, and D. J. BALDING, 2000  Bayesian fine-scale mapping of disease loci, by hidden Markov models. Am. J. Hum. Genet. 67:155-169[Medline].

NAPP-ZINN, K., 1985 Arabidopsis thaliana, pp. 492–503 in Handbook of Flowering, edited by H. A. HALEVY. CRC Press, Boca Raton, FL.

NORDBORG, M., 2000  Linkage disequilibrium, gene trees, and selfing: an ancestral recombination graph with partial self-fertilization. Genetics 154:923-929[Abstract/Free Full Text].

NORDBORG, M., 2001 Coalescent theory, pp. 179–212 in Handbook of Statistical Genetics, edited by D. J. BALDING, M. J. BISHOP and C. CANNINGS. John Wiley & Sons, Chichester, UK.

NORDBORG, M. and J. BERGELSON, 1999  The effect of seed and rosette cold treatment on germination and flowering time in some Arabidopsis thaliana (Brassicaceae) ecotypes. Am. J. Bot. 86:470-475[Abstract/Free Full Text].

NORDBORG, M. and S. TAVARÉ, 2002  Linkage disequilibrium: What does history have to tell us? Trends Genet. 18:83-90[Medline].

NORDBORG, M., J. O. BOREVITZ, J. BERGELSON, C. C. BERRY, and J. CHORY et al., 2002  The extent of linkage disequilibrium in the highly selfing species Arabidopsis thaliana.. Nat. Genet. 30:190-193[Medline].

PRITCHARD, J. K. and M. PRZEWORSKI, 2001  Linkage disequilibrium in humans: models and data. Am. J. Hum. Genet. 69:1-14[Medline].

REICH, D. E., M. CARGILL, S. BOLK, J. IRELAND, and P. C. SABETI et al., 2001  Linkage disequilibrium in the human genome. Nature 411:199-204[Medline].

SANDA, S. L., M. JOHN, and R. M. AMASINO, 1997  Analysis of flowering time in ecotypes of Arabidopsis thaliana.. J. Hered. 88:69-72[Abstract/Free Full Text].

SHARBEL, T. F., B. HAUBOLD, and T. MITCHELL-OLDS, 2000  Genetics isolation by distance in Arabidopsis thaliana: biogeography and postglacial colonization of Europe. Mol. Ecol. 9:2109-2118[Medline].

SHELDON, C. C., J. E. BURN, P. P. PEREZ, J. METZGER, and J. A. EDWARDS et al., 1999  The flf MADS box gene: a repressor of flowering in Arabidopsis regulated by vernalization and methylation. Plant Cell 11:445-458[Abstract/Free Full Text].

TAJIMA, F., 1983  Evolutionary relationship of DNA sequences in finite populations. Genetics 105:437-460[Abstract/Free Full Text].

TAJIMA, F., 1989  Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585-595[Abstract/Free Full Text].

TEMPLETON, A. R., A. G. CLARK, K. M. WEISS, D. A. NICKERSON, and E. BOERWINKLE et al., 2000  Recombinational and mutational hotspots within the human lipoprotein lipase gene. Am. J. Hum. Genet. 66:69-83[Medline].

THORNSBERRY, J. M., M. M. GOODMAN, J. DOEBLEY, S. KRESOVICH, and D. NIELSEN et al., 2001  Dwarf8 polymorphisms associate with variation in flowering time. Nat. Genet. 28:286-289[Medline].

TOIVONEN, H. T. T., P. ONKAMO, K. VASKO, V. OLLIKAINEN, and P. SEVON et al., 2000  Data mining applied to linkage disequilibrium mapping. Am. J. Hum. Genet. 67:133-145[Medline].

WALL, J. D., 2000  A comparison of estimators of the population recombination rate. Mol. Biol. Evol. 17:156-163[Abstract/Free Full Text].

WATTERSON, G. A., 1975  On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7:256-276[Medline].

WEIR, B. S., 1996 Genetic Data Analysis, Ed. 2. Sinauer, Sunderland, MA.

WEISS, K. M. and J. D. TERWILLIGER, 2000  How many diseases does it take to map a gene with SNPs? Nat. Genet. 26:151-157[Medline].




This article has been cited by other articles:


Home page
Proc. Natl. Acad. Sci. USAHome page
N. Scarcelli, J. M. Cheverud, B. A. Schaal, and P. X. Kover
Antagonistic pleiotropic effects reduce the potential adaptive value of the FRIGIDA locus
PNAS, October 23, 2007; 104(43): 16986 - 16991.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
T. Slotte, K. Holm, L. M. McIntyre, U. Lagercrantz, and M. Lascoux
Differential Expression of Genes Important for Adaptation in Capsella bursa-pastoris (Brassicaceae)
Plant Physiology, September 1, 2007; 145(1): 160 - 173.
[Abstract] [Full Text] [PDF]


Home page
ANN BOT (LOND)Home page
C. Shindo, G. Bernasconi, and C. S. Hardtke
Natural Genetic Variation in Arabidopsis: Tools, Traits and Prospects for Evolutionary Ecology
Ann. Bot., June 1, 2007; 99(6): 1043 - 1054.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
I. M. Ehrenreich, P. A. Stafford, and M. D. Purugganan
The Genetic Architecture of Shoot Branching in Arabidopsis thaliana: A Comparative Assessment of Candidate Gene Associations vs. Quantitative Trait Locus Mapping
Genetics, June 1, 2007; 176(2): 1223 - 1236.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
S. Stracke, T. Presterl, N. Stein, D. Perovic, F. Ordon, and A. Graner
Effects of Introgression and Recombination on Haplotype Structure and Linkage Disequilibrium Surrounding a Locus Encoding Bymovirus Resistance in Barley
Genetics, February 1, 2007; 175(2): 805 - 817.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Bot.Home page
I. M. Ehrenreich and M. D. Purugganan
The molecular genetic basis of plant adaptation
Am. J. Botany, July 1, 2006; 93(7): 953 - 962.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
H. Innan
Modified Hudson-Kreitman-Aguade Test and Two-Dimensional Evaluation of Neutrality Tests
Genetics, July 1, 2006; 173(3): 1725 - 1733.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. Jakobsson, J. Hagenblad, S. Tavare, T. Sall, C. Hallden, C. Lind-Hallden, and M. Nordborg
A Unique Recent Origin of the Allotetraploid Species Arabidopsis suecica: Evidence from Nuclear DNA Markers
Mol. Biol. Evol., June 1, 2006; 23(6): 1217 - 1231.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
T. C. Bruen, H. Philippe, and D. Bryant
A Simple and Robust Statistical Test for Detecting the Presence of Recombination
Genetics, April 1, 2006; 172(4): 2665 - 2681.
[Abstract] [Full Text] [PDF]


Home page
DevelopmentHome page
R. J. Schmitz, L. Hong, S. Michaels, and R. M. Amasino
FRIGIDA-ESSENTIAL 1 interacts genetically with FRIGIDA and FRIGIDA-LIKE 1 to promote the winter-annual habit of Arabidopsis thaliana
Development, December 15, 2005; 132(24): 5471 - 5478.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. T. Hamblin, M. G. Salas Fernandez, A. M. Casa, S. E. Mitchell, A. H. Paterson, and S. Kresovich
Equilibrium Processes Cannot Explain High Levels of Short- and Medium-Range Linkage Disequilibrium in the Domesticated Grass Sorghum bicolor
Genetics, November 1, 2005; 171(3): 1247 - 1256.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
K. J. Schmid, S. Ramos-Onsins, H. Ringys-Beckstein, B. Weisshaar, and T. Mitchell-Olds
A Multilocus Sequence Survey in Arabidopsis thaliana Reveals a Genome-Wide Departure From a Neutral Model of DNA Sequence Polymorphism
Genetics, March 1, 2005; 169(3): 1601 - 1615.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. Hagenblad, C. Tang, J. Molitor, J. Werner, K. Zhao, H. Zheng, P. Marjoram, D. Weigel, and M. Nordborg
Haplotype Structure and Phenotypic Associations in the Chromosomal Regions Surrounding Two Arabidopsis thaliana Flowering Time Loci
Genetics, November 1, 2004; 168(3): 1627 - 1638.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
G. R. Brown, G. P. Gill, R. J. Kuntz, C. H. Langley, and D. B. Neale
Nucleotide diversity and linkage disequilibrium in loblolly pine
PNAS, October 19, 2004; 101(42): 15255 - 15260.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
J. Kroymann, S. Donnerhacke, D. Schnabelrauch, and T. Mitchell-Olds
Evolutionary dynamics of an Arabidopsis insect resistance quantitative trait locus
PNAS, November 25, 2003; 100(suppl_2): 14587 - 14592.
[Abstract] [Full Text]


Home page
GeneticsHome page
A. J. Garris, S. R. McCouch, and S. Kresovich
Population Structure and Its Effect on Haplotype Diversity and Linkage Disequilibrium Surrounding the xa5 Locus of Rice (Oryza sativa L.)
Genetics, October 1, 2003; 165(2): 759 - 769.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
K. J. Schmid, T. R. Sorensen, R. Stracke, O. Torjek, T. Altmann, T. Mitchell-Olds, and B. Weisshaar
Large-Scale Identification and Analysis of Genome-Wide Single-Nucleotide Polymorphisms for Mapping in Arabidopsis thaliana
Genome Res., June 1, 2003; 13(6): 1250 - 1257.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
J. O. Borevitz and M. Nordborg
The Impact of Genomics on the Study of Natural Variation in Arabidopsis
Plant Physiology, June 1, 2003; 132(2): 718 - 725.
[Full Text] [PDF]