Interest in the level and organization of nucleotide diversity in domesticated plant lineages has recently been motivated by the potential for using association-based mapping techniques as a means for identifying the genes underlying complex traits. To date, however, such data have been available only for a relatively small number of well-characterized plant taxa. Here we provide the first detailed description of patterns of nucleotide polymorphism in wild and cultivated sunflower (Helianthus annuus), using sequence data from nine nuclear genes. The resuflts of this study indicate that wild sunflower harbors at least as much nucleotide diversity as has been reported in other wild plant taxa, with randomly selected sequence pairs being expected to differ at 1 of every 70 bp. In contrast, cultivated sunflower has retained only 40–50% of the diversity present in the wild. Consistent with this dramatic reduction in polymorphism, a phylogenetic analysis of our data revealed that the cultivars form a monophyletic clade, adding to the growing body of evidence that sunflower is the product of a single domestication. Eight of the nine loci surveyed appeared to be evolving primarily under purifying selection, while the remaining locus may have been the subject of positive selection. Linkage disequilibrium (LD) decayed very rapidly in the self-incompatible wild sunflower, with the expected LD falling to negligible levels within 200 bp. The cultivars, on the other hand, exhibited somewhat higher levels of LD, with nonrandom associations persisting up to ∼1100 bp. Taken together, these results suggest that association-based approaches will provide a high degree of resolution for the mapping of functional variation in sunflower.
THE domestication of crop plants is typically accompanied by a genomewide loss of genetic diversity (Tanksley and McCouch 1997). This reduction in diversity is typically due, at least in part, to the population bottleneck that occurs during the founding of a new crop lineage (e.g., Eyre-Walker et al. 1998). In addition to this so-called “domestication bottleneck,” the transition to self-fertilization that often accompanies domestication can further reduce levels of genetic diversity (Pollack 1987; Nordborg 2000), as can selection on the genes underlying agronomically important traits (although this latter effect occurs in a locus-specific fashion; e.g., Hanson et al. 1996; Tenaillon et al. 2004). While the effects of domestication on genetic diversity are likely to vary across taxa, comprehensive surveys of nucleotide diversity in crop plants and their wild progenitors have been performed in only a handful of systems. On the basis of data from the major cereal crops, it appears that genomewide reductions in diversity on the order of 30–40% are not uncommon (Buckler et al. 2001), with selectively important loci often exhibiting even greater losses (e.g., Whitt et al. 2002). In addition to these effects on the overall level of polymorphism, domestication can also have a major impact on the organization of genetic diversity within the genome. Indeed, population bottlenecks can produce transient increases in linkage disequilibrium (LD, the nonrandom association of alleles at different sites) throughout the genome. Similarly, the increase in homozygosity associated with a transition to partial or full self-fertilization reduces the effective recombination rate, once again resulting in elevated LD across the genome (Nordborg 2000). Selection can have a similar, albeit localized, effect on LD in and around the targeted loci (e.g., Clark et al. 2004).
Beyond the obvious concern that reduced genetic variability might limit the potential for crop improvement over the long term (Harlan 1984), interest in the level and organization of nucleotide variability in domesticated plant lineages has recently been motivated by the potential for using association-mapping techniques as a means for identifying the genes underlying agronomically important traits (Flint-García et al. 2003). In the extreme, association-based approaches can even be used to identify the single-nucleotide polymorphisms (SNPs) that are actually responsible for particular trait differences (i.e., so-called quantitative trait nucleotides, QTNs) (Long and Langley 1999). While association mapping promises to provide a great deal of insight into the genetic basis of complex traits, this approach requires a detailed understanding of the distribution of genetic variation across the genome, including data on the density of SNPs and the structure of LD.
To date, the vast majority of nucleotide polymorphism data in plants have come from a relatively small (but growing) number of well-characterized study systems, such as Arabidopsis (e.g., Savolainen et al. 2000; Aguadé 2001; Nordborg et al. 2002; Wright et al. 2003; Ramos-Onsins et al. 2004), several major crops (e.g., White and Doebley 1999; Tenaillon et al. 2001, 2002; Garris et al. 2003; Zhu et al. 2003; Hamblin et al. 2004), and a handful of other taxa (e.g., Lin et al. 2001; Tiffin and Gaut 2001; Dvornyk et al. 2002; García-Gil et al. 2003; Kado et al. 2003; Brown et al. 2004; Ingvarsson 2005). While some generalities have emerged from these studies (e.g., a tendency toward reduced levels of polymorphism and elevated LD in selfers vs. outcrossers), it is clear that the details learned from the study of any one system do not necessarily apply to another, even if they share similar mating systems, demographic histories, etc. With this in mind, we set out to provide the first detailed description of the level of nucleotide diversity and the extent of LD in a broad collection of wild and cultivated sunflower accessions.
Derived from the wild sunflower (Helianthus annuus), the cultivated sunflower (also H. annuus) is one of the world's most important oilseed crops and is also a major source of confectionery seeds (Putt 1997). Despite being fully interfertile and considered to be members of the same species, wild and cultivated sunflower exhibit a number of striking morphological differences. In short, wild sunflower has a highly branched growth form with numerous small flowering heads and relatively small achenes (i.e., single-seeded fruits) that are dispersed at maturity. In contrast, cultivated sunflower is characterized by an unbranched stem that is topped by a single, large head and relatively large achenes that are retained until harvest. Moreover, wild sunflower is an obligate outcrosser, whereas cultivated sunflower has lost the sporophytic self-incompatibility that is typical of the genus. Despite this potentially major shift in breeding system, however, the extent to which cultivated sunflower actually self-pollinates in the field remains unknown.
Although cultivated sunflower was long thought to be the product of a single origin of domestication >4000 years ago (Heiser 1954, 1955; Rieseberg and Seiler 1990; Crites 1993), this premise was subsequently called into question on the basis of both archaeological and genetic evidence (e.g., Heiser 1985; Lentz et al. 2001; Tang and Knapp 2003). In the most comprehensive molecular analysis to date, however, Harter et al. (2004) argued convincingly that the eight extant Native American landraces, from which the modern cultivars are presumably derived, can all be reliably assigned to a single population genetic cluster. This result led them to conclude that these lines do, in fact, trace back to a single origin of domestication, most likely somewhere within what is now considered to be the central United States.
While sunflower has recently been the subject of a substantial amount of EST sequencing (see http://compgenomics.ucdavis.edu), detailed analyses of sequence diversity derived from a broad sample of germplasm are lacking. Rather, analyses of genetic diversity in sunflower have thus far relied primarily on techniques such as allozyme (Rieseberg and Seiler 1990; Cronn et al. 1997) and SSR (Tang and Knapp 2003; Harter et al. 2004; Burke et al. 2005) genotyping. Here we seek to rectify this situation by reporting on patterns of nucleotide polymorphism in a widespread sample of wild sunflower individuals, as well as a diverse collection of cultivars.
MATERIALS AND METHODS
Sampling strategy and plant materials:
Seeds of 16 wild H. annuus populations and 16 cultivated lines were obtained from the North Central Regional Plant Introduction Station (NCRPIS, Ames, IA; Table 1). The wild populations included in this study were selected to provide broad geographic coverage of the species' range in North America. The 16 cultivated lines were composed of 8 Native American landraces, which represent the most primitive sunflower domesticates available, and 8 improved lines that were selected such that, when combined with the landraces, our collection of cultivars contained at least one representative from 9 of the 10 subsets that make up the NCRPIS H. annuus core collection. With the exception of cmsHA89, which is an elite inbred oilseed line that has been used in a variety of other studies (e.g., Burke et al. 2002, 2004; Tang and Knapp 2003), the improved lines included here represent open-pollinated cultivars. Thus, the 16 cultivars included in this study are largely comparable to the “exotic” lines employed by Burke et al. (2005). Upon receipt, seeds from each accession were germinated and the resulting seedlings were reared in the greenhouse. Following emergence, 200 mg of leaf tissue was collected from each seedling, and total genomic DNA was extracted from one individual per accession using the QIAGEN (Valencia, CA) DNeasy plant mini kit.
The nine genes that were selected for inclusion in this study are briefly outlined below (see also Table 2). Calmodulin (CAM) plays a central role in calcium-mediated signaling in plants. Chalcone synthase (CHS; EC 18.104.22.168) plays an essential role in the biosynthesis of plant phenylpropanoids. Glyceraldehyde-3-phosphate dehydrogenase (GAPDH; EC 22.214.171.124) is a tetrameric NAD+ binding protein that is involved in glycolysis and gluconeogenesis. Cytosolic phosphoglucose isomerase (PGIC; EC 126.96.36.199) catalyzes the reversible isomerization of 6-phosphoglucose and 6-phosphofructose, an essential reaction that precedes sucrose biosynthesis. GIA/RGA is a putative gibberellin response modulator. Glutathione peroxidase (GPX; EC 188.8.131.52) and glutathione S-transferase (GST; EC 184.108.40.206) are antioxidants that are thought to play an important role in protecting against oxidative damage. Finally, SCR-1 and SCR-2 show homology to SCARECROW (SCR) or SCARECROW-like gene regulators. SCR is known to be involved in asymmetric cell division in plants (e.g., Kamiya et al. 2003). The genetic map positions of all nine of these genes are currently unknown.
PCR amplification and sequencing:
Primers for all loci except PGIC were designed solely on the basis of sunflower EST sequences contained within the Compositae Genome Project Database (http://cgpdb.ucdavis.edu; see Table 2 for contig IDs). In contrast, exons 16–21 of PGIC were initially amplified using universal primers (yamV and AA16F) developed by L. D. Gottlieb (unpublished data). Once this region was sequenced (see below for details), an internal primer (16R) was designed from our sequences and used along with an EST-derived primer from exon 12 (12F) to amplify exons 12–16. Thus, we were able to sequence exons 12–21 of this gene. Similarly, CHS was amplified on the basis of primer pairs designed from two contigs that overlapped, but that had not previously been assembled into a single unigene. The internal primers were designed from the region of overlap, such that the sequences could subsequently be assembled end-to-end.
Wherever possible, PCR products were purified using the QIAGEN PCR purification kit and directly sequenced. Heterozygotes were dealt with in several ways. First, if the two alleles within an individual differed sufficiently in length, they were separated via agarose gel electrophoresis, isolated, and sequenced. In cases where alleles could not be separated, but direct sequence could be generated, the forward and reverse sequences were assembled, heterozygous sites were identified using Sequencher (Gene Codes, Ann Arbor, MI), and haplotypes were inferred via “haplotype substraction” (Clark 1990; Olsen and Schaal 1999). In cases where PCR products could not be directly sequenced, or where haplotypes could not be readily inferred, PCR products were cloned using the QIAGEN PCR cloning kit prior to sequencing. In such cases, multiple clones were sequenced from each individual to distinguish between alleles within an individual and to control for Taq polymerase errors. Thus, each individual was represented by two alleles at each locus. All genes were sequenced in both directions using DYEnamic ET cycle sequencing kits (Amersham Biosciences, Piscataway, NJ) following the manufacturer's protocol on an MJ BaseStation automated DNA sequencer (MJ Research, South San Francisco).
Multiple sequence alignments were made using Se-Al version 2.0a11 (Rambaut 1996; http://evolve.zoo.ox.ac.uk). The coding and noncoding regions of each gene were then identified by aligning our sequences against the original EST sequences and via BLAST searches. Estimates of nucleotide polymorphism (π and θ, calculated on a per site basis), population subdivision (i.e., FST between wild and cultivated sunflower), and Tajima's (1989) D were obtained using the software package DnaSP 4.00.5 (Rozas and Rozas 1999). DnaSP was also used to estimate the minimum number of recombination events (RM) in the history of the wild and cultivated subsamples, using the four-gamete test (Hudson and Kaplan 1985) as well as the strength of linkage disequilibrium between pairs of polymorphic sites (computed as the squared allele frequency correlation, r2; Weir 1990). The population recombination parameter (ρ = 4Ner, where Ne is the effective population size and r is the recombination rate) was estimated using the composite-likelihood estimator of Hudson (2001) as implemented in the software package LDhat (available from http://www.stats.ox.ac.uk/∼mcvean/LDhat/), and Wall's (1999) B was estimated using COMPUTE (Thornton 2003). Contiguous indels were treated as single polymorphisms, and singletons were excluded from all analyses of linkage disequilibrium.
The decay of linkage disequilibrium over physical distance was investigated following the methods of Remington et al. (2001). Briefly, the expected value of r2 at drift-recombination equilibrium is E(r2) = 1/(1 + ρ) (Hill and Weir 1988). Allowing for a low level of mutation and correcting for finite sample size, this relationship becomes(1)where n is the number of sequences sampled. The nonlinear equation based on this relationship contains a single coefficient (b1), which corresponds to the least-squares estimate of ρ per base pair. We pooled our data across genes and fit this model separately for the wild and cultivated samples using PROC NLIN in SAS Ver. 6.12 (SAS Institute, Cary, NC). Although factors such as nonindependence among linked sites and nonequilibrium populations can reduce the precision of and/or bias such analyses, possibly resulting in unreliable estimates of ρ (Weir and Hill 1986), such analyses are still useful for investigating the overall rate of decay of linkage disequilibrium (e.g., Remington et al. 2001; Ingvarsson 2005). Following the methods of Macdonald et al. (2005), we also summarized the observed r2-values using the ksmooth function in the statistical programming language R (http://www.R-project.org/).
To further investigate the origin of cultivated sunflower, we constructed a phylogeny of the 16 wild sunflower accessions, as well as the 8 Native American landraces (i.e., the “primitive” lines), which, unlike the “improved” lines that made up the balance of our sequencing panel, are free from the confounding effects of human-mediated introgression during the postdomestication era. We used the neighbor-joining algorithm of PAUP Ver. 4.0b10 (Swofford 2002) to construct a phylogeny on the basis of the combined sequence data. Indels were recoded as numerical characters prior to analysis, and branch support was estimated on the basis of 1000 bootstrap replicates of the data.
All nine gene regions were sequenced in each of the 32 sampled individuals. Including indels, sequence lengths varied from 504 to 1642 bp (Table 3), and sequences from all genes but SCR-1 and SCR-2 included both coding and noncoding (i.e., intron and/or UTR) regions. Thus, we were able to analyze 8207 bp of aligned sequence per individual, with nearly two-thirds (5328 bp) coming from coding regions. Across samples, the number of indel polymorphisms per gene varied from 0 to 12, with a total of 31 indel polymorphisms in the data set. Of these, all but 1 (a 3-bp indel in the coding region of GIA/RGA) occurred in noncoding regions. Indel size was highly variable, ranging from a single nucleotide in some cases (including three single-base indels embedded within mononucleotide repeat motifs in PGIC) to >100 bp in others. More specifically, two wild individuals harbored CAM indels spanning ≥100 bp, and the largest indel observed (250 bp) was found within one of the PGIC introns (flanked by exons 12 and 13) in two wild individuals. All indels were excluded from subsequent analyses of nucleotide polymorphism.
Single-nucleotide polymorphisms were considerably more frequent than indels, with a total of 444 polymorphic sites being identified across all individuals and all genes, resulting in an average of 1 SNP for every 16.8 bp of sequence (excluding indels). When considered separately, the wild sunflowers harbored 392 polymorphic sites (1 SNP/19.1 bp), whereas the cultivars harbored 194 polymorphic sites (1 SNP/38.8 bp). Inspection of Table 4 confirms that levels of nucleotide polymorphism are generally quite high. More specifically, estimates of total nucleotide diversity (πT) for the data set as a whole ranged from 0.0016 to 0.0328 (mean = 0.0106), and Watterson's θ (θW) ranged from 0.0030 to 0.0317 (mean = 0.0139). Not surprisingly, a comparison of the wild and cultivated subsamples revealed that πT and θW are both significantly higher in wild sunflower as compared to the cultivars (0.0128 vs. 0.0056; paired t = 3.14, d.f. = 8, P = 0.007 and 0.0144 vs. 0.0072; paired t = 5.03, d.f. = 8, P = 0.0005, respectively). Similarly, silent-site diversity (πsil) as well as synonymous (πsyn) and nonsynonymous (πnonsyn) nucleotide diversity was significantly higher in the wild subsample than in the cultivars (all P ≤ 0.008). In terms of the extent of divergence between subsamples, FST values averaged 0.1837 ± 0.038 (mean ± SE), indicating that the wild and cultivated sunflower gene pools are moderately differentiated.
Tests for nonneutral evolution:
For all loci except GPX, πnonsyn was markedly lower than πsyn, with the πnonsyn/πsyn ratio ranging from 0.024 to 0.353 in the full data set (the corresponding values for the wild and cultivated subsamples were 0.025–0.333 and 0–0.197, respectively), suggesting that diversity at these eight loci is largely governed by purifying selection. For GPX, on the other hand, the πnonsyn/πsyn ratio for the full data set was 1.148, whereas the corresponding values for the wild and cultivated subsamples were 1.005 and 1.110, respectively. This result suggests either that GPX has experienced a relaxation of the purifying selection that has presumably shaped diversity at the other eight genes or that some portion of the GPX coding region has been under positive selection. In terms of allele frequency distributions, Tajima's D was significantly negative at GPX in the full data set (Table 4), indicating an excess of rare alleles. While superficially consistent with the hypothesis that GPX was the target of recent positive selection, it must be kept in mind that: (1) the corresponding estimates from both the wild and cultivated subsamples were not significantly different from zero when they were considered separately, and (2) estimates of Tajima's D were generally negative (albeit nonsignificantly so) across all other loci. Thus, factors other than selection may be responsible for the observed excess of rare alleles.
Results of the LD analyses are summarized in Figure 1 and Table 5. Data from all nine genes were pooled for the wild and cultivated subsamples and Equation 1 was used to model the decay of LD across physical distance. In general terms, the expected value of r2 declines very rapidly in wild sunflower, falling to negligible levels (i.e., ≤0.10) within 200 bp, whereas somewhat higher levels of LD are maintained across greater distances in cultivated sunflower. Observed levels of LD are actually somewhat lower than the expected values at short distances, although the wild data largely follow the expectation. For the cultivar data, observed LD declines out to ∼650 bp, at which point it begins to drift upward. This pattern is likely due, at least in part, to increased sampling variation in the cultivars resulting from lower overall levels of polymorphism. While we were unable to estimate certain recombination parameters for all nine genes in the cultivated subsample due to a lack of sufficient polymorphism, estimates of the population recombination parameter using Hudson's (2001) composite-likelihood estimator ranged from 0.0012 to 0.1483 (0.0528 ± 0.016) in wild sunflower and from 0.0036 to 0.0298 (0.0155 ± 0.007) in cultivated sunflower. Similarly, the minimum number of recombination events ranged from 2 to 11 (7.7 ± 1.8) in wild sunflower and from 0 to 9 (2.7 ± 1.1) in cultivated sunflower, and Wall's B ranged from 0 to 0.4286 (0.1180 ± 0.051) in wild sunflower and from 0.0526 to 0.3611 (0.1760 ± 0.065) in cultivated sunflower. In all three cases, the differences were significant (paired t-test, both P < 0.05) with wild sunflower exhibiting higher recombination (and thus lower LD) than cultivated sunflower. Estimates of interlocus disequilibrium were negligible for all nine genes within both the wild and cultivated subsamples (data not shown).
Another method for investigating patterns of linkage disequilibrium, particularly when it comes to making comparisons among populations, is to scale estimates of ρ against θ. The rationale for doing so is that, under the assumptions of the standard neutral model, both values are proportional to the effective population size (ρ = 4Ner and θ = 4Neμ, respectively; Hudson 1987), such that the ratio ρ/θ becomes the recombination rate divided by the mutation rate (i.e., r/μ). This ratio ranged from 0.98 to 11.87 (4.10 ± 2.6) in wild sunflower and from 0.21 to 5.84 (1.94 ± 1.3) in the cultivars at the four loci for which we were able to estimate ρ (using Hudson's 2001 estimator) from both the wild and cultivated subsamples (see Tables 4 and 5). While this result is consistent with greater recombination in wild as compared to cultivated sunflower, the difference was not significant (paired t-test P = 0.10).
Inspection of the neighbor-joining tree in Figure 2 reveals that the primitive domesticates form a relatively well-supported, monophyletic clade. Note that the inclusion of the “improved” accessions resulted in a similar overall pattern (data not shown). The primary difference following the inclusion of the improved lines was a decrease in bootstrap support, perhaps due to wild × cultivar introgression during sunflower improvement. The cultivars, however, still formed a monophyletic clade. Note that the apparent monophyly of the cultivars is apparent only when the data are combined across loci. While these results are fully consistent with the conclusions of Harter et al. (2004) regarding a single origin of domesticated sunflower, our data do not provide sufficient resolution to corroborate their placement of the domestication event in the central United States.
This study represents the most comprehensive analysis of DNA sequence variation in wild and cultivated sunflower to date. Although there was considerable locus-to-locus variation, with estimates of nucleotide diversity varying >10-fold across loci, it is clear that wild sunflower contains substantial levels of nucleotide diversity (Table 4). Indeed, wild sunflower appears to harbor at least as much silent-site diversity (πsil = 0.0234 ± 0.006) as do a number of other wild taxa that have been studied to date. For example, specieswide silent-site diversity (πsil) in the selfing Arabidopsis thaliana is 0.011 (Aguadé 2001), whereas in the outcrossing species A. lyrata and A. halleri the corresponding values are 0.023 and 0.015 (Wright et al. 2003; Ramos-Onsins et al. 2004). Similarly, three wild relatives of maize, Zea diploperennis, Z. perennis, and Z. parviglumis have πsil = 0.012, 0.013, and 0.023, respectively (White and Doebley 1999; Tiffin and Gaut 2001), and the highly outcrossed tree species Populus tremula has πsil = 0.016 (Ingvarsson 2005).
In contrast to wild sunflower, cultivated sunflower contains markedly less nucleotide variation, with the cultivars included in our survey exhibiting only 40–50% as much diversity (depending on the measure) as was found in the wild. In terms of the overall density of polymorphisms across the regions that we analyzed, we found an average of 1 SNP/19.1 and 38.8 bp across our samples of wild and cultivated sunflower, respectively. Because θW is roughly proportional to heterozygosity, we can further conclude that a randomly selected pair of wild (or cultivated) sunflower sequences would be expected to differ at an average of 1 of every ∼70 (or ∼140) nucleotides (i.e., 1/0.0144 ≈ 70 and 1/0.0072 ≈ 140). For the sake of comparison, randomly selected pairs of maize sequences are expected to differ at 1 of every ∼105 nucleotides (Tenaillon et al. 2001), whereas pairs of soybean sequences are expected to differ at 1 of every ∼1030 nucleotides (Zhu et al. 2003).
While the pattern documented here is qualitatively similar to what has been found in previous surveys of genetic variation in sunflower, the loss of diversity is somewhat greater with sequence data than with either allozymes or SSRs. For example, Rieseberg and Seiler (1990) and Cronn et al. (1997) found that cultivated sunflower contains ∼50–60% of the allozyme diversity present in wild sunflower. Both of these studies, however, reported only the mean level of within-population heterozygosity, as opposed to a true specieswide estimate of diversity, and thus are not strictly comparable to our data. With regard to SSR variation, the exotic cultivated sunflower gene pool has previously been shown to contain ∼65–80% of the diversity present across the range of wild sunflower (Tang and Knapp 2003; Harter et al. 2004; Burke et al. 2005). The fact that this portion of the cultivated sunflower gene pool appears to have lost comparatively little SSR diversity is most likely a result of the relatively high mutation rates that are typical of SSRs (e.g., Diwan and Cregan 1997; Vigouroux et al. 2002). Given an initial loss of variation, SSR diversity would be expected to rebound much more rapidly than would nucleotide diversity.
The observed loss of diversity from wild to cultivated sunflower is likely due, at least in part, to a population bottleneck during the domestication of sunflower. It is, however, also possible that the loss of self-incompatibility in cultivated sunflower played a role in producing this pattern. Indeed, inbreeding is known to result in both a reduction of effective population size (Pollack 1987) and an amplification of the effects of background selection (Charlesworth et al. 1993), both of which would act to reduce genetic variation across the genome (see also Nordborg 2000). It is worth noting here that the primitive and improved accessions that composed the cultivar portion of our sequencing panel (Table 1) contained similar levels of nucleotide diversity when compared to each other (data not shown), indicating that much of the diversity that made it through the initial stages of domestication can be found in the open-pollinated cultivars.
Evidence of selection:
Eight of the nine gene regions that we analyzed exhibited low πnonsyn/πsyn ratios and thus appear to be evolving primarily under purifying selection. The one exception to this pattern (GPX) exhibited a somewhat elevated nonsynonymous substitution rate (Table 4). It is important to note here that the elevated nonsynonymous substitution rate is evident not only across the full sample of 32 sequences, but also within the wild and cultivated subsamples. Thus, it seems unlikely that this pattern arose as a result of selection during domestication. Rather, the most likely explanation is that GPX, which is an antioxidant that is thought to play an important role in the defense against oxidative damage in the face of a variety of environmental stresses (Rodriguez Milla et al. 2003), has been under divergent selective pressures across both wild and cultivated sunflower accessions.
As might be expected of an obligate outcrosser, LD decays extremely rapidly in wild sunflower. More specifically, expected levels of LD decline to negligible levels (r2 < 0.10) within 200 bp (Figure 1). In contrast, nonrandom associations appear to be maintained over somewhat longer distances in the self-compatible cultivated sunflower, with a predicted decline to r2 < 0.10 within ∼1100 bp. While the extent of LD differs somewhat across genes, the overall pattern of higher LD (and lower recombination) in cultivated vs. wild sunflower holds across loci (Table 5). This increase in the extent of LD in cultivated sunflower is likely due to a decrease in effective population size owing to the presumptive domestication bottleneck, as well as to a possible increase in the occurrence of inbreeding. Even with the transition to self-compatibility, however, the expected level of LD appears to decay relatively rapidly in cultivated sunflower as compared to predominantly autogamous crops. For example, Zhu et al. (2003) concluded that there is little decline in LD over distances as great as 50 kbp in soybean, whereas Garris et al. (2003) found that LD in rice approaches r2 = 0.10 only after ∼100 kbp (but see Morrell et al. 2005 for an example of rapid decline of LD in a predominantly selfing taxon). In contrast, r2 declines to <0.10 within ∼1 kb in maize, which is highly outcrossed (Remington et al. 2001; Tenaillon et al. 2001). Thus, the patterns documented here appear to be typical of a taxon with a history of relatively frequent outcrossing. It should be noted, however, that our sampling strategy was primarily designed to investigate domestication-related changes in nucleotide diversity and LD. Thus, our data do not provide any insight into patterns of polymorphism in the elite inbred lines, where nonrandom associations might be expected to extend over longer distances.
Taken together, our results indicate that wild sunflower harbors at least as much nucleotide polymorphism as has been reported in other wild plant taxa and that the cultivated sunflower gene pool has retained only 40–50% of this diversity. Our results also add to the growing body of evidence that cultivated sunflower is the product of a single domestication event. As noted above, the issue of diversity loss during domestication has been most thoroughly investigated in the major cereal crops, where losses of 30–40% have been documented (Buckler et al. 2001). The fact that cultivated sunflower has experienced a greater domestication-related loss of diversity than is typical of the cereals suggests that sunflower may have experienced an even smaller and/or lengthier domestication bottleneck than did the various cereal crops. The results of our work also suggest that association-based approaches may provide a high degree of resolution for the identification of genes underlying trait variation in sunflower. Indeed, even the self-compatible cultivated sunflower lines included in this study exhibited relatively low levels of LD as compared to predominantly autogamous crops. Given this pattern, most SNPs that are significantly associated with a trait would be expected to reside in relatively close proximity to the causative genetic variant. In the case of the cultivars surveyed here, this should allow functional variation to be mapped to the level of the gene, whereas even finer-scale localization may be possible in wild sunflower.
We thank Mark Chapman, Peter Morrell, Catherine Pashley, Natasha Sherman, Jessica Wenzler, David Wills, and two anonymous reviewers for comments on an earlier version of this manuscript. Stuart Macdonald and David Remington provided assistance with the linkage disequilibrium analyses. This work was supported by grants to J.M.B. from the National Science Foundation (DBI-0332411) and the United States Department of Agriculture (03-35300-13104 and 03-39210-13958). EST sequence data were obtained from the Compositae Genome Project website, which was funded by the USDA IFAFS program.
- Received September 13, 2005.
- Accepted November 16, 2005.
- Copyright © 2006 by the Genetics Society of America