In human genetics a detailed knowledge of linkage disequilibrium (LD) is considered a prerequisite for effective population-based, high-resolution gene mapping and cloning. Similar opportunities exist for plants; however, differences in breeding system and population history need to be considered. Here we report a detailed study of localized LD in different populations of an inbreeding crop species. We measured LD between and within four gene loci within the region surrounding the hardness locus in three different gene pools of barley (Hordeum vulgare). We demonstrate that LD extends to at least 212 kb in elite barley cultivars but is rapidly eroded in related inbreeding ancestral populations. Our results indicate that haplotype-based sequence analysis in multiple populations will provide new opportunities to adjust the resolution of association studies in inbreeding crop species.
LARGE-SCALE investigations of sequence variation within genes and across genomes have only just begun for plant species. Such studies are required to determine the distribution and extent of linkage disequilibrium (LD), the nonrandom association of alleles, since this will determine the resolution power of association-based mapping strategies. In human and mammalian genetics, knowledge of LD is a topic of great current interest and has been successfully used to refine high-resolution mapping studies for complex disease genes and provide new insights into human evolution and the distribution of meiotic crossover events (Ardlie et al. 2002). Similar opportunities exist for plants (Gaut and Long 2003); however, the impact of biological and historical factors influencing the extent of LD need to be assessed before informative mapping strategies can be implemented. Initial studies of LD in maize (outcrosser) showed that the extent of LD decay differs from within a few hundred to 2000 bp, depending on whether landrace or cultivated material is analyzed (Remington et al. 2001; Tenaillon et al. 2001; Palaisa et al. 2003). In contrast, small isolated populations of Arabidopsis (inbreeder) exhibit drastic differences in the extent of LD, ranging from 1 to >50 cM (Nordborg et al. 2002). Garris et al. (2003) studied haplotype diversity and LD around the xa5 locus of rice (Oryza sativa L.), an inbreeder. The data revealed that significant LD was detected between sites 100 kb apart. These studies vividly illustrate the impact different breeding systems can have on LD. The natural decay of LD with distance occurs at a considerably slower rate in inbreeding systems because effective recombination is severely reduced and genetic polymorphisms remain correlated over longer physical distances (Nordborg et al. 2002; Morrell et al. 2005). This extended LD is a potential barrier to the localization of causative polymorphisms of phenotypic effects. Some of the world's most important crops such as rice, soybean, barley, and wheat are inbreeders. The impact of inbreeding on the magnitude and pattern of LD in these plants will also be amplified by human intervention arising from the processes of selection and domestication (Tanksley and McCouch 1997). In addition, the haploid genome size of crop plants such as wheat and barley (5000 Mbp; Arumuganathan and Earle 1991) is 50 times greater than that of Arabidopsis (125 Mb) mainly due to the recent amplification of repetitive DNA (SanMiguel et al. 1998). These factors suggest that studies of LD in inbreeding crop plants are needed to complement information emerging from Arabidopsis to provide a comprehensive picture of the patterns and magnitude of LD in plant genomes. To characterize the extent of LD in an inbreeding crop species, we analyzed inter- and intragenic associations across four gene loci within 215 kb of the barley hardness locus (Ha). The objectives of this study were: (1) to determine the extent and magnitude of LD at a regional level and (2) to relate empirical estimates of LD to genome composition and the varying evolutionary histories of different gene pools. Our data indicate that LD varies dramatically between different inbreeding populations of barley, providing exciting opportunities to pursue a two-tiered LD mapping strategy based on whole-genome scans of cultivated barley germ plasm followed by high-resolution LD mapping in ancestral landrace and Hordeum spontaneum populations.
MATERIALS AND METHODS
Seventy-four cultivated, 23 landrace, and 34 wild barley accessions were used in this study (Table 1). Cultivated material consisted predominantly of European breeding lines. Landrace and wild lines were chosen for their global distribution.
SNP discovery and genotyping:
PCR primers were designed to four gene regions of 1.5–3.5 kb in size from the ∼113-kb genic space of the Ha locus (GenBank accessions nos. AY643842–AY643844). Amplicons were sequenced using ABI BigDye Terminators Version 2 on an ABI 3700 automated sequencer and assembled with Sequencer version 4.1.4 (Gene Codes, Ann Arbor, MI) software. All products were sequenced in both directions and the absence of PCR and sequencing errors was confirmed by repeated sequencing of independent amplicons. Sequences were submitted to GenBank under accession nos. AY643845–AY644336.
Estimates of nucleotide polymorphism (Watterson's estimate, θW; nucleotide diversity, π), neutrality tests (Tajima's D, McDonald–Kreitman, HKA), recombination (Hudson and Kaplan), and linkage disequilibrium (r2) and their statistical significance were calculated using DnaSP Version 3.53 (Lewontin 1964; Lewontin and Kojim 1964; Watterson 1975; Tajima 1983, 1989, 1993; Hudson et al. 1987; Nei 1987; Nei and Miller 1990; McDonald and Kreitman 1991; Rozas and Rozas 1999). All sites with a frequency <0.10 for the rare allele were excluded for LD analysis because r2 has a large variance with rare alleles. Plots of all informative pairwise comparisons and the negative log P-values of significance among associations relative to physical distance (in kilobases), in addition to expected and observed number of significant associations at specific distances, were generated in Microsoft Excel. The median association value for each set of pairwise comparisons between sites located within two different gene regions was calculated and plotted against the corresponding median distance for each set using GenStat for Windows (2002, Release 6.2, 6th Edition, VSN International). Recombination analysis was performed using LDhat (http://www.stats.ox.ac.uk/∼mcvean/LDhat/).
Patterns of LD within candidate gene regions:
Four gene regions from the previously described (Caldwell et al. 2004) fully sequenced genomic region surrounding the hardness locus in barley, namely hinb, hina, GSP, and a putative gene (PG2), were analyzed to determine the level of LD (r2) between informative sites (rare allele with f > 0.1; Figure 1). High levels of association extended across the entire 3373-bp hinb gene region containing both hinb-1 and hinb-2 in the cultivated sample with relatively few pairwise comparisons demonstrating low association. A scarcity of low association values was also observed at the hina region. However, in contrast to the hinb region, there appeared to be evidence of LD decay after 1000 bp. The plot of LD across the GSP gene revealed a high level of association across the entire 1805-bp region. However, this pattern also differed from that observed at the hinb region as association values demonstrating a range of magnitudes were found between pairs of sites at varying, intermediate distances. An insufficient number of informative sites for accurately assessing LD existed within the PG2 region in the cultivated sample.
The overall extent of LD observed within gene regions in the landrace sample was similar to that observed within the cultivated material (Figure 1); high levels of association stretched across each gene region. However, a substantial number of pairwise comparisons within the hinb and hina gene regions gave moderate to low association values that were not detected in the cultivated material. Furthermore, there was a complete absence of intermediate association values in the GSP region. This bimodal distribution could be attributed to an elevated level of high association values as a result of small sample size. Although more informative SNPs were available for association analysis in the PG2 region relative to the cultivated sample, the paucity of these events still prevented accurate assessment of LD.
High levels of association extended across the entire hinb-1 and hinb-2 gene region in the wild sample (Figure 1). However, in contrast to the cultivated data, a substantial number of intermediate and low association values were observed at a range of distances within the gene region. Likewise, a more distinct pattern of LD decay within the hina region was observed in the wild sample relative to the cultivated material with complete decay to <0.2 by 1100 bp. An even greater rate of decay was observed in the PG2 region where association values dropped to <0.2 by 400 bp. A general trend of low association values is apparent within the GSP region despite the small number of informative SNPs.
Patterns of LD across a contiguous 212-kb region:
Estimates of association were also determined among the different gene regions, which are nonuniformly distributed across the 212 kb contiguous sequence (Figure 2). Estimates of LD between all pairs of informative sites are summarized in Figures 3 and 4; supplemental Figure 1 at http://www.genetics.org/supplemental/. High levels of association were found to stretch across the entire region in the cultivated sample. Although a gradual decay of LD was observed with distance, highly significant (P < 0.0001) LD values extended across the entire 212-kb region and the median level for each distance group never decayed below 0.2.
To determine if this pattern and the level of LD were maintained in ancestral gene pools, the landrace and wild samples were assessed across the same region. LD and its significance decreased as a function of increasing distance in both ancestral samples (Figures 3 and 4; supplemental Figure 1 at http://www.genetics.org/supplemental). However, this pattern was particularly distinct in the wild material (Figures 3 and 4). Significant median LD values at a level >0.1 extended as far as 83 kb in the landrace sample and complete decay was seen by 98 kb (Figure 5). In contrast, complete equilibrium outside intragenic associations was observed in the wild sample with no median values at a level >0.1 (Figure 5). Although a few pairwise comparisons at distances >100 kb demonstrated perfect association (r2 = 1) in the landrace sample, it is likely that these were spurious associations detected as a consequence of the small sample size and inability to recognize and remove rare polymorphisms (Figure 3).
The stark contrast in LD between the cultivated and ancestral gene pools is further exemplified by the distribution of −log P-values for pairwise associations (Figure 6). Although all three sample sets demonstrated higher levels of significant association than would be expected in the absence of LD, the majority (>57%) of pairwise associations in the cultivated material demonstrated −log P-values >4% compared to ≤5% in the other two gene pools.
LD and its relation to genome organization:
Although the overall trend reveals a gradual decrease of LD with physical distance, there is also an undulating pattern in the levels of association among the different between-gene comparison groups in all three sample sets studied (Figure 3). This pattern is even more striking when the median LD value plots are considered (Figure 5). In an effort to explain this pattern, focus was turned to the genome composition and organization of the region. The sequence spanning the candidate grain texture genes includes several different patterns of genome organization that are typical of small-grained cereals, including both solitary genes and gene clusters separated by stretches of nested repetitive element insertions. The presence of either singular or nested transposable elements has previously been implicated as a mechanism for recombination suppression in several eukaryotic genomes and, therefore, could have an impact on local levels of LD (Charlesworth et al. 1994; Arabidopsis Genome Initiative 2000; Fu et al. 2002; Yao et al. 2002).
Two groups of pairwise comparisons involved sites from genes separated by large expanses of nested repetitive sequence: hinb and hina and GSP and PG2. The hinb and hina gene regions are separated by 77 kb, 93% of which is composed of transposable elements (Figure 2). This region occupies a 2.5-fold greater intergenic interval than that observed between hina and GSP, which contains only one small ∼5-kb retrotransposon (Figure 2). Nevertheless, association values between pairwise sites flanking the 77-kb transposable element cluster were higher and more significant than those between the 28-kb intergenic region composed primarily of low-copy genic sequence (Figures 4 and 5). This result suggests recombination suppression in the former region. Plots showing local pairwise deviations from the assumption of a constant rate of recombination across the region indicated that recombination is indeed reduced between hinb and hina within the cultivated sample (Figure 7). This localized region of suppression is even more evident in the wild sample.
The intergenic space between GSP and PG2 was also primarily composed of nested transposable elements (96%) and spanned a region even larger than that observed between hina and hinb (96 kb; Figure 2). However, in contrast to the hina and hinb region, the GSP and PG2 region demonstrated one of the lowest median association values of all groups considered (Figures 4 and 5). This suggests that the extensive regional expansion caused by element insertion has had negligible suppression on the recombination between these two genes. On the contrary, GSP and PG appear to be recombination hotspots compared to the rest of the region analyzed (Figure 7). Therefore, the presence/absence of repetitive DNA cannot solely account for the punctuated pattern of LD observed across the entire sequenced region.
Impact of selection on local levels of LD:
Contrasting gene histories for the different regions could also contribute to the undulating pattern of LD. In all three sample sets, the hinb-1 gene demonstrated the strongest evidence for selection (McDonald–Krietman test, P < 0.05; HKA test, P < 0.05 with Triticum tauschii sequence as an outgroup). It is striking, therefore, that both prominent peaks present on the plots of median LD values correspond to pairwise comparisons involving sites located within the hinb-1 gene region (Figure 5). Although median LD values are too low to observe this pattern in the wild germ plasm, plots of the 95th percentile are consistent with this observation (data not shown). In contrast, no evidence for selection was observed for the PG2 gene region regardless of the sample set analyzed or the test statistic employed. Median values of pairwise comparison groups involving sites located within the PG2 gene demonstrate the lowest LD values observed.
Contrasting evolutionary histories of different germ-plasm samples as a tool for association mapping strategies:
Our observations in barley provide an example of significantly different patterns of LD detected across a relatively short physical distance among different samples of an autogamous species. These contrasting patterns exist despite similar levels of inbreeding and most likely reflect different population histories associated with the occurrence of bottlenecks and selection within the domesticated germ plasm. Similar observations have recently been reported for 25 accessions of H. spontaneum, with intralocus LD decaying rapidly at a rate similar to that observed in the outbreeding species, Zea mays (Morrell et al. 2005). The observations have crucial implications for the design and execution of association studies, suggesting a two-tiered strategy for LD mapping. To take advantage of large LD in elite germ plasm, low-resolution whole-genome scans could be deployed to identify candidate gene regions. This would be complemented by fine-scale, high-resolution LD mapping utilizing landrace and wild germ plasm to identify candidate genes. The possibility of using a two-tiered approach has previously been reported in human and maize LD studies (Reich et al. 2001; Nordborg et al. 2002) Recently, a haplotype-based approach, rather than individual SNP associations, was used to identify and elucidate natural allelic variation at the CRY2 flowering-time gene in Arabidopsis (Olsen et al. 2004). This study, together with our own observations of barley, suggest that haplotype tagging coupled with LD mapping in different populations of barley will provide new opportunities to connect sequence diversity to complex phenotypes in crop plants.
Contrasting gene histories generate a punctuated pattern of LD:
A smooth progression of decreasing association with increasing distance was not observed among the different barley samples (Figures 3 and 5). Instead, an undulating pattern of LD was observed with several regions of notable increase in LD at intermediate distances (Figures 3 and 5). This observation is similar to that described for humans for which the “haplotype-block” model of LD has gained prominence (Jeffreys et al. 2001) and has also been described in Arabidopsis (Haubold et al. 2002). One plausible explanation for this punctuated pattern of LD is the presence of contrasting gene histories within the same local chromosomal region. The different patterns of nucleotide diversity observed among the genes analyzed suggested varying intensities of selection (Table 2). Furthermore, individual comparisons of association within genes demonstrated differences in the magnitude and extent of LD (Figure 1). In all three sample sets, the region harboring the two hinb gene copies demonstrated the highest levels of association with negligible signs of LD decay (Figure 1). This is consistent with evidence that suggests that the hinb-1 region was subjected to past directional selection (K. S. Caldwell, P. Langridge and W. Powell, unpublished results). In contrast, LD was found to decay within only a few hundred base pairs at the PG2 gene region in the wild germ plasm (Figure 1). Limited LD maintenance within this gene region is supported by the lack of evidence to suggest any past selection (K. S. Caldwell, P. Langridge and W. Powell, unpublished results). These local signatures of selection may explain the undulating pattern of LD across the entire region; peaks of high LD corresponded perfectly to associations involving the putatively selected hinb-1 gene region (Figures 3 and 4). Likewise, depressions of low LD corresponded to associations involving the assumed neutral PG2 gene region (Figures 3 and 4).
Evidence of changes in local levels of LD as a result of contrasting gene history has been previously reported at the different adh loci in H. spontaneum (Lin et al. 2001, 2002). However, such examples are not limited to Hordeum species. The FRI gene in Arabidopsis may have been subjected to local adaptation, which could account for the extended range of LD (250 kb) surrounding the region (Johanson et al. 2000; Nordborg et al. 2002). In comparison, regions in Arabidopsis demonstrating evidence of balancing selection, CLV2 and RPS5 loci, showed a reduction in the extent of LD by as much as a factor of 10 (Tian et al. 2002; Shepard and Purugganan 2003). Breeding selection can also leave a mark in the patterns of LD within a plant system. Despite the close relationship between Y1 and PSY2 in maize, the two loci demonstrate drastically different nucleotide diversity levels with Y1 displaying a >10-fold increase in the extent of LD relative to PSY2 (Remington et al. 2001). This is predicted to be a result of human selection (Palaisa et al. 2003). These results indicate that association analysis could be exploited as a valuable tool for understanding the genetic basis of adaptation to new environments and the phenotypic diversity arising from the crop domestication process (Kraakman et al. 2004).
The role of transposable elements in observed LD patterns:
Several studies in plant species indicate that recombination is predominantly active in gene-rich chromosomal regions (Gill et al. 1996a,b; Schnable et al. 1998; Faris et al. 2000; Kunzel et al. 2000). Furthermore, single transposable elements and clusters of nested repetitive elements within plant species are believed to be either recombinantly inert or suppressors of recombination (Fu et al. 2002; Yao et al. 2002). As a consequence, successful fine-scale mapping of causative mutations based on association mapping may be largely dependent upon the genome composition surrounding candidate genes.
Pairwise associations between two different sets of genes each separated by >77 kb of nested repetitive sequence (Figure 1) revealed substantially different levels of association. High levels of association extended across the intergenic region between hinb-1 and hina with a median value of 0.8 (Figure 1). The region between GSP and PG2, however, showed minimal evidence of strong association and a median value of 0.2 (Figure 1). This suggests that the presence of large segments of transposable sequence was not sufficient to suppress recombination between the genes characterized in this region of the barley genome. However, this assumes that all the lines represented in this study maintain the same or have a very similar gene content and genome organization, which may not be the case. A comparison of the bronze locus in two maize inbred lines revealed that the location and extent of element insertion as well as gene content were highly variable (Fu and Dooner 2002). Furthermore, the results reported here cannot distinguish between recombination events occurring within repetitive or low-copy regions in the intergenic space. Only exon 3 of PG2 was included in the nucleotide diversity and association analysis. Therefore, the 4 kb of low-copy sequence, including the latter portion of the gene and the 3′ flanking region, could harbor a recombination hotspot accounting for the low levels of association involving the PG2 region. This would be consistent with the observation that exon 3 of PG2 contained the highest level of recombination observed within all regions analyzed (Figure 7). Similarly, although the transposable elements in the intergenic region between hinb-1 and hina could have an effect on the local recombination rate, it would be difficult to distinguish this effect from that of regional selection on the basis of information available here.
This work represents a detailed study into the levels and patterns of local or short-range LD within an inbreeding crop species and suggests that LD-based approaches will be a powerful tool for identifying the allelic variants that contribute to complex traits in crop plants. The contrasting evolutionary histories of crop gene pools represent a unique biological resource that will allow the scale and resolution of association-based studies to be modulated in a highly flexible manner. Furthermore, the punctuated pattern of LD generated by different gene histories within the 212-kb region indicate that association analysis could also be a valuable tool for locating genes involved in local adaptation and the domestication process.
We thank J. McNicol for statistical support, D. Charlesworth for helpful discussions, and A. Rafalski for critical review of the manuscript. We also thank the Scottish Executive Environment and Rural Affairs Department for their financial support.
- Received November 11, 2004.
- Accepted October 3, 2005.
- Copyright © 2006 by the Genetics Society of America