A total of 568 new simple sequence repeat (SSR)-based markers for barley have been developed from a combination of database sequences and small insert genomic libraries enriched for a range of short simple sequence repeats. Analysis of the SSRs on 16 barley cultivars revealed variable levels of informativeness but no obvious correlation was found with SSR repeat length, motif type, or map position. Of the 568 SSRs developed, 242 were genetically mapped, 216 with 37 previously published SSRs in a single doubled-haploid population derived from the F1 of an interspecific cross between the cultivar Lina and Hordeum spontaneum Canada Park and 26 SSRs in two other mapping populations. A total of 27 SSRs amplified multiple loci. Centromeric clustering of markers was observed in the main mapping population; however, the clustering severity was reduced in intraspecific crosses, supporting the notion that the observed marker distribution was largely a genetical effect. The mapped SSRs provide a framework for rapidly assigning chromosomal designations and polarity in future mapping programs in barley and a convenient alternative to RFLP for aligning information derived from different populations. A list of the 242 primer pairs that amplify mapped SSRs from total barley genomic DNA is presented.
BARLEY (Hordeum vulgare L.) is one of the most important crop species in the world and has been subject to considerable genetic study. It is a diploid (2n = 2x = 14), largely self-fertilizing species with a large genome of 5.3 × 109 bp/1 C (Bennett and Smith 1976). Detailed restriction fragment length polymorphism (RFLP) linkage maps have been developed (Graneret al. 1991; Kleinhofset al. 1993) and these have allowed the close syntenic relationship between barley and the three genomes of wheat to be elucidated (Namuthet al. 1994; Dubcovskyet al. 1996). The genetic advantages of working with a self-compatible true diploid, together with the availability of a large number of genetic stocks and its considerable economic importance, have resulted in barley being proposed as a model for the entire Triticeae (Linde-Laursenet al. 1997). Although first-generation maps have proved extremely informative, particularly in comparative mapping within the Triticeae and with other Graminae (Mooreet al. 1995; Shermanet al. 1995), the hybridization-based markers deployed in these studies have genetical and practical disadvantages, particularly in the context of applied agricultural research. This has led to considerable interest in PCR-based markers, in particular those based on simple sequence repeats (SSR).
The development of SSR markers for barley has followed a common pattern with the first few derived from sequences held in public databases (Saghai Maroofet al. 1994; Becker and Heun 1995). This has been followed by the screening of small insert genomic libraries for SSR motifs (Liuet al. 1996; Struss and Plieske 1998). Sixty-four barley SSRs are in the public domain at present. This limited progress indicates that SSR isolation and characterization from plants is not trivial and that effective strategies that increase the efficiency of the SSR discovery and development phase need to be devised. While a number of approaches have been described (Ostranderet al. 1992; Edwardset al. 1996), one strategy that has not featured heavily in plant SSR discovery programs in the scientific literature is the use of libraries enriched for particular SSR motifs.
Here we report the discovery, characteristics, development, and linkage mapping of 568 new barley SSRs from enriched small insert libraries and from new sequences in the public databases. Using these markers, we have derived a second-generation linkage map of barley using only single-locus/low-copy PCR-based markers.
MATERIALS AND METHODS
Plant material: The variety Blenheim was used as the DNA source for the construction of all the libraries with the exception of two later libraries enriched for (AG)n, which were constructed using DNA from the varieties Blenheim, Livet, Igri, and Franka. For most SSRs, a panel of 16 barley lines was used for screening for polymorphism. These lines formed the parents of 10 mapping populations and included 8 spring barleys, Golf, Alexis, Vada, Lina, Blenheim, E224/3, Aura, and Regatta; 6 winter barleys, Puffin, NSL91-6319, Plaisant, Dicktoo, Igri, and Franka; and two Hordeum spontaneum accessions, 1-B-87 and Canada Park. In addition, 86 doubled-haploid (DH) lines, derived from the F1’s of crosses between Lina × H. spontaneum Canada Park (Waughet al. 1996), were used for mapping. DNA from all the above lines was extracted from 3-wk-old leaf material using the CTAB method (Saghai Maroofet al. 1984).
Exceptions to the above were the SSRs tested at IPK. These were tested against a panel of nine barley lines: three H. spontaneum lines, HS 213, 277, and 584, provided by the Bundesforschungsanstalt für Züchtungsforschung (Aschersleben, Germany) and the varieties Brenda, Trasco, Steptoe, Morex, Igri, and Franka. Polymorphic SSRs were mapped in the two populations derived from these last four varieties (Graneret al. 1991; Kleinhofset al. 1993).
SSR isolation and characterization: SSR-containing DNA sequences were derived from public access sequence databases and from enriched libraries. The database-derived SSRs were obtained by searching the Hordeum sections of EMBL and GenBank for sequences containing microsatellites, using the find-patterns program in the Wisconsin GCG package. The data file included all possible mono-, di-, tri-, tetra-, penta-, and hexanucleotide repeats: (N)15, (NN)7, (NNN)5, (NNNN)4, (NNNNN)4, and (NNNNNN)4. In addition, the sputnik program (C. Abajian, University of Washington, http://www.abajian.com/sputnik/) was used to double-check the above results and extract imperfect repeats. Seventeen database-derived SSRs, which had already been published (Saghai Maroofet al. 1994; Becker and Heun 1995; Liuet al. 1996; Petersen and Seberg 1998), were excluded from further analysis. The database-derived SSRs were named after the EMBL name for the sequence containing the repeat.
A total of nine enriched small insert DNA libraries were constructed using a method based on a single-stranded DNA hybridization selection (A. Rafalski and M. Morgante, personal communication). The libraries were based upon Tsp509E digestion of barley DNA and were enriched for (AC)n × 3, (AG)n × 3, (AAC)n, (ACC)n, and (ATC)n. Details of the enrichment procedure used for all the libraries and the cloning procedure using λ ZAP II, used for seven of the libraries, are given in Ramsay et al. (1999). Two later (AG)n libraries used the plasmid pUC19 vector for ease of handling. In addition to these libraries, 50 clones from an enriched microsatellite library, constructed according to Edwards et al. (1996), were examined. The enrichment procedure used was different from that detailed above, with the use of RsaI restriction digestion and multiple bound oligonucleotides enriching for DNA fragments hybridizing to (AT)15, (GA)15, (GC)15, (GT)15, (ATT)10, (CAA)10, (GCC)10, (ATAG)10, (CATA)10, and (GATA)10. SSRs, isolated from genomic libraries, were titled Bm (barley microsatellite) or EBm (E to represent those originating from EU FAIR CT95-0003), followed by the respective repeat motif and a consecutive four-digit number.
Sequence checking and primer design: The systematic checking of the sequences, against an in-house database of SSR-containing clones using BLASTN (Altschulet al. 1997), allowed the identification of duplicates at an early stage. In addition, searches against this and public databases, using FASTA (Pearson 1990), screened out sequences that showed homology to known repetitive elements prior to primer design (Ramsayet al. 1999). Primers were designed to sequences flanking SSRs using the computer program PRIMER (v 0.5 and v 3.0; E. Lander, Cambridge, MA).
Polymerase chain reaction and fragment analysis: PCR reactions were performed in a total volume of 10 μl and consisted of 20 ng genomic DNA, 1 × PCR buffer, 0.3 units Taq polymerase (both from Boehringer-Mannheim, Indianapolis), 0.3 μm of forward and reverse primers, 200 μm dNTPs, and 0.5 μCi of [α-32P]dCTP and were run on a PE 9600. The optimized PCR cycling conditions varied and the conditions that amplified the strongest product of the expected size on a 1.5% agarose gel were utilized in our mapping studies, which were carried out as described by Morgante et al. (1994). After PCR an equal volume of 95% formamide electrophoresis loading buffer was added to the samples, which were then denatured, snap cooled on ice, and electrophoresed in 6% Easigel (Scotlab) according to standard procedures. An M13 sequencing marker was run to estimate product sizes and gels were double loaded for mapping studies where possible. Visualization of results was achieved by exposure of fixed, dried gels to X-ray film. At IPK, fragment analysis was performed using an automated laser fluorescence sequencing machine, as described in Röder et al. (1998b). The quality of the PCR products was rated on a scale of 1 to 5, following Smulders et al. (1997). Details of the mapped primer pairs and the PCR amplification conditions used are given in Appendixes S1 and S2, respectively, available at the Genetics supplemental data site at http://www.genetics.org/supplemental/156/4/1997/DC1.
In addition, primers designed to SSRs already published were synthesized or obtained as designated “Barley map pairs” from Research Genomics (Huntsville, AL). Finally, two wheat SSRs, WMS6 (Röderet al. 1995) and WMC1E8 from the wheat microsatellite consortium library (P. Isaac, personal communication) were utilized, as was a single SSR (Bmatc0001) derived from a small insert λ library (courtesy of J. R. Russell).
Parental line screening and linkage analysis: The frequencies of the alleles within the barley lines screened were used to calculate diversity indices (DI) estimated as 1 - Rp 2i, where pi is the frequency of the ith allele in the given population (Weir 1990).
Linkage analysis in the Lina × H. spontaneum Canada Park population was carried out by combining the SSR data with existing amplified fragment length polymorphism (AFLP) data and performed using Joinmap (v 2.0; P. Stam and J. W. Van Ooijen, CPRO-DLO, Wageningen, The Netherlands). The linkage groups were separated using a LOD score of 5.0.
SSR isolation and characterization: The 10 enriched libraries used within this study showed some variation in the degree of enrichment and duplication. Table 1 shows the total number of clones sequenced and the final number of working primer pairs that amplified a product of the expected size giving an overall attrition rate of 24.7% (488/1978). Typically, the number of sequences containing an SSR repeat represented ∼10% of the recombinant clones screened (i.e., for one library, 204 from ∼3000 screened), which is an ∼50-fold enrichment compared to the results of Liu et al. (1996). The level of duplication within the libraries generally became unacceptable after sequencing ∼180-200 clones. Most duplication was to clones produced within the same library, but subsequent libraries also contained a proportion (∼2.5%) of clones found in independently constructed libraries.
Overall, 1217 unique short DNA restriction fragments containing SSR motifs were isolated from a total of 1978 sequenced clones. Of these, 632 contained either too little flanking DNA sequence to design compatible primer pairs or were found in the later libraries to show homology to retroelements. This meant that 585 primer pairs were designed to the unique sequences, 488 of which gave products of the expected size, giving an attrition rate of 24.7% (488/1978). These 488 primer pairs, together with the 79 new database-derived SSRs and the wheat SSR WMC1E8, bring the total to 568. Seven of these (Bmac0018, Bmac0030, Bmac0032, Bmac0040, Bmac0064, Bmac0090, and Bmac0399) have been published in an earlier study (Russellet al. 1997) under a slightly different nomenclature (as BMS18, BMS30, BMS32, BMS40, BMS64, BMS90, and BMS02, respectively).
Working SSRs were derived from all libraries with the exception of the library enriched for (AAC)n repeats, where the repetitive structure of the sequences precluded successful primer design. All 568 SSRs, together with the 63 already published, were categorized according to repeat length, composition, and type. A comparison of the repeat lengths of dinucleotide SSRs in clones from the enriched libraries vs. dinucleotide SSRs derived from unenriched libraries or databases showed that the repeats found in the enriched libraries were significantly longer (22.7 ± 1.7 compared to 12.4 ± 1.8 repeats). Categorization of the SSRs, according to motif type, allowed a comparison of numbers of perfect/imperfect and simple/compound repeats in the various libraries. A significant difference was found between the proportions of simple/perfect repeats found in the libraries enriched for the two dinucleotide motifs. Only 40% of SSRs from the (AC)n-enriched libraries were simple/perfect repeats compared with 76% of SSRs from the (AG)n libraries.
Sequence comparisons with publicly held databases highlighted homologies between a significant proportion of the dinucleotide SSR-containing clones to several cereal retroelements. This work has been reported in Ramsay et al. (1999). Similar homologies were also found for the libraries enriched for (ACC)n and (ATC)n. Despite distinct PCR products of the correct size being amplified with primer pairs derived from some of these clones early on in this study, SSR-containing sequences, which showed homology to retroelements, were subsequently not used for primer design.
Quality of SSR markers: Following primer design and confirmation of product size on agarose gels, the SSRs were visualized using radioactive labeling and denaturing-PAGE electrophoresis. The quality of the PCR products was rated on a decreasing scale of 1-5 following Smulders et al. (1997; see Stephensonet al. 1998). Of 535 primer pairs for which data were generated, 364 (68%) were placed in the top two classes. SSR analysis of the 16 barley lines revealed variable levels of informativeness. The DI values derived from screening these lines and the quality scores are given in Appendix S1 at http://www.genetics.org/supplemental/156/4/1997/DC1. No significant differences were found between the DI values of SSRs derived from different libraries or different repeat types or SSRs mapping to different regions of the genome. Some correlation (r = 0.46) was found between DI value and repeat number but only when analysis was restricted to simple perfect dinucleotide repeats.
Linkage mapping: Segregation data derived from 253 SSR primer pairs (216 new and 37 previously published), which were polymorphic in the Lina × H. spontaneum Canada Park population, were entered alongside a collection of segregating AFLP alleles and the data set submitted for analysis using Joinmap (v 2.0). Seven linkage groups corresponding to the seven barley chromosomes were separated initially at a LOD score of 5.0 (Figure 1). The genetic map generated had a total map length of 1173 cM and showed strong segregation distortion around the centromeric region of chromosome 2H. Chromosome designation and polarity were inferred from a combination of AFLPs (Waughet al. 1997) and previously mapped SSRs (Becker and Heun 1995; Liuet al. 1996). Subsequent confirmation of linkage group designations came from isozyme data for α- and β-amylase, and mapping of SSRs from known sequences, which had previously been mapped as RFLP, such as HVLOX and HVOLE. A total of 299 SSR loci were incorporated into the Lina × H. spontaneum Canada Park map (261 from new and 38 from previously published primer pairs). This total included multiple loci amplified from 27 of the primer pairs (26 new and 1 previously published). In addition, 26 single-locus SSRs were mapped in other populations, 22 in the Steptoe × Morex, and 4 in Igri × Franka populations. Thus, in total, 39, 56, 45, 46, 37, 38, and 64 (i.e., 325) SSR loci have been mapped to chromosomes 1H-7H, respectively.
All previously mapped SSRs (Becker and Heun 1995; Liuet al. 1996) mapped to equivalent positions with the exception of HVM 63, which mapped to the centromeric region of chromosome 2H but not chromosome 1H. In addition, in this population, HVM 11 detected two loci on chromosome 7H as well as 6H as found by Liu et al. (1996). HVM 64 and HVM 31, which were previously only assigned to chromosomes 1H and 6H by the use of addition lines, were mapped intrachromosomally to the same groups.
The most prominent feature of the map is the strong clustering of SSR loci around the centromeric regions of all seven linkage groups, a feature also observed by Liu et al. (1996). As a result of the clustering, genome coverage with SSRs remains incomplete with an obvious lack of markers on the long arms of chromosomes 1H and 5H and short arm of chromosome 6H. The gaps observed in the Lina × H. spontaneum Canada Park map on 5H and 6H correspond to gaps on other published maps (Kleinhofset al. 1993).
The use of previously mapped SSRs also allowed the comparison of their segregation with the results presented in Liu et al. (1996) in a H. vulgare × H. vulgare cross (Steptoe × Morex). On chromosome 4H, HVM 14 and HVM 65 are linked without recombination in Lina × H. spontaneum Canada Park but this region is expanded to 9 cM in Steptoe × Morex. Similarly, on chromosome 6H, there is an expansion of a centromeric cluster of four markers in a 4-cM interval to 17 cM.
We have developed 568 new SSR primer pairs (microsatellites) for barley, 242 of which have been mapped to date (Appendix S1 at http://www.genetics.org/supplemental/156/4/1997/DC1). A list of new unmap ped SSRs is available in Appendix S3 online at http://www.genetics.org/supplemental/156/4/1997/DC1. The 568 new SSRs, together with the 64 already published (Becker and Heun 1995; Liuet al. 1996; Petersen and Seberg 1998; Struss and Plieske 1998), mean that there are now 632 barley SSRs in the public domain.
The sequences obtained in this SSR discovery program suggest that the enrichment procedure biases the recovery of certain repeats given the level of duplication within and between libraries and the preferential selection of larger repeats. To minimize redundancy, either different enzymes or different cultivars should be used for additional libraries. The different proportions of simple/perfect repeats in the (AC)n and (AG)n libraries are unlikely to be artifacts of the enrichment procedure as these proportions are similar to those reported by Struss and Plieske (1998) in an unenriched library. Interestingly, such proportions have been reported in Eucalyptus (Brondaniet al. 1998), implying that they reflect a fundamental difference in the character and evolution of these dinucleotide repeats in plant genomes.
An unexpected feature of the clones from the enriched libraries was sequence homology in a high proportion (∼40%) of them, in one or both flanking regions, to specific regions of known cereal transposable elements (Ramsayet al. 1999). This observed association may explain the amplification of multiple loci by some of the primer pairs. It corresponds well with the finding of Röder et al. (1998b) that the efficacy of working microsatellite isolation improved from 31 to 59% after switching to the use of PstI-derived libraries. Although these associations were demonstrated within enriched libraries, it is worth noting that multiple loci were found with SSRs derived from unenriched libraries and from the database (HVM11 and HvWAXY4).
The observed association may also be implicated in some of the low quality scores, according to the criteria of Smulders et al. (1997), found for some of the SSRs. However, sometimes the scores were improved dramatically by redesigning or end-labeling the primers or by running the gels under highly denaturing conditions (20% formamide). For genotyping applications, we would strongly recommend using only the top two classes and optimizing all of the parameters to ensure accurate and reproducible results. For linkage studies using doubled-haploid populations, we were able to reproducibly monitor the segregation of amplified alleles in all classes. However, it is possible that the association of repetitive elements with SSRs in barley may produce additional loci in nonhomologous positions in crosses using different germplasm. This complication may be avoided by the use of a core framework set of SSRs with a high quality score.
The linkage map of the Lina × H. spontaneum Canada Park population presented here represents a second-generation linkage map, derived using only low/single-copy PCR-based markers. The total map length of 1173 cM (Kosambi) is comparable with that observed in other DH populations (Graneret al. 1991; Kleinhofset al. 1993; Thomaset al. 1995; Heunet al. 1996). Segregation distortion, as shown around the centromeric region of 2H, is common in doubled-haploid populations (Graneret al. 1991) and does not appear to be a unique feature of this cross (H. vulgare × H. spontaneum). One prominent feature of the genetic linkage map is the significant clustering of the SSRs around the centromeric regions of each linkage group. This could be due to a nonrandom physical distribution of SSRs due to an association with retroelements (Ramsayet al. 1999) or the preferential selection of longer SSRs (Areshchenkova and Ganal 1999). While this possibility cannot be discounted, the observed genetic distribution is most likely influenced by the distribution of recombination events in the mapping population. Indeed the strong clustering may have been exaggerated given the interspecific nature of the mapping population. Differences in the distribution and number of chiasmata in wide crosses affect the distribution of mapped marker loci (Messegueret al. 1991) and it is possible that this phenomenon is found in the Lina × H. spontaneum Canada Park cross. This is supported by comparisons of the genetic distances found between common SSRs in the centromeric regions of chromosomes 4H and 6H in the map presented here and the Steptoe × Morex map presented by Liu et al. (1996). While the clustering is probably accentuated in the wide cross used, it is probable that the observed distribution reflects the basic uneven physical distribution of recombination known to occur in barley and other plants with large genomes. Considerable restriction of crossing over in the centromeric regions has been observed in barley through the use of translocation stocks (Künzelet al. 2000), indicating that markers clustering around centromeric regions with little recombination in the genetic map may relate to a considerable proportion of the physical map. Support for this interpretation comes from the clustering of SSRs mapped in intraspecific crosses (Liuet al. 1996) and from other marker types also mapped in the Lina × H. spontaneum Canada Park population (data not shown). In barley, EcoRI- and PstI-derived AFLPs exhibit a disjunct distribution, with the distribution of the latter more akin to that found with PstI genomic DNA probes in RFLP analysis (Powellet al. 1997).
The use of low-copy genomic probes to construct the first-generation RFLP-based linkage maps in the Triticeae largely avoided the problem of the extreme variation in the ratio of genetic to physical distance, as such sequences are mostly present in the distal unmethylated gene and recombination-rich regions of the genome (Mooreet al. 1993). Improved SSR coverage could therefore probably be achieved by the use of enriched libraries constructed using methylation-sensitive enzymes, although this would not completely eliminate the problem of centromeric clustering (Röder et al. 1998a,b). However, a more targeted approach based upon the screening or sample sequencing of selected clones from large insert libraries could provide better coverage with fewer SSRs (Creganet al. 1999; Cardleet al. 2000).
This study revealed only a weak correlation between polymorphism and SSR repeat length (Innanet al. 1997; Smulderset al. 1997) and no obvious correlation between polymorphism and SSR map position relative to the centromere, unlike low-copy RFLP in Aegilops (Dvoráket al. 1998). Although this lack of correlation could be due to the small population sample used, Innan et al. (1997) similarly failed to find such a relationship between polymorphism and locus position using SSRs on a range of ecotypes of Arabidopsis. This implies that the high mutation rate of SSRs is sufficient to overcome the linkage disequilibrium maintained in recombination-poor regions of the genome and that the use of SSRs would thus increase the chances of finding polymorphisms in the pericentromeric regions as well as give the means to survey larger population sizes that would allow the identification of rare crossover events in these intractable genomic regions.
In conclusion, the SSRs described here, together with those described by Liu et al. (1996) and Becker and Heun (1995), provide a considerable technological resource, providing barley breeders and geneticists with an array of suitable tools for a range of target applications. Given the continuing rapid development of barley SSRs, we have set up a public access web site (http://www.scri.sari.ac.uk/ssr) to allow the rapid dissemination of new primers and mapping information. We envisage that their development will ultimately supersede RFLPs as a means of mapping, aligning maps, and integrating different genetic studies within Hordeum. Their application in both linkage and diversity studies will provide a common reference that will facilitate the rapid integration of mapping data from different populations with that from ecological and biodiversity studies in barley. Most importantly, their PCR-based nature will provide nonmolecular geneticists with a simple and informative assay for estimating the level of variation at specific regions of the barley genome.
The authors acknowledge the Biotechnology and Biological Sciences Research Council grant PAG04430, which provided funding for M. Macaulay; the European Union grant FAIR-CT95-0003 for funding for L. Ramsay, S. degli Ivanissivich, and A. Massari; and the Scottish Executive Rural Affairs Department, which supports R. Waugh, W. Powell, K. MacLean, and J. Fuller through “core”-funded research. K. J. Edwards is supported by the Biotechnology and Biological Sciences Research Council, and T. Sjakste and M. Ganal by the Deutsche Forschungsgemeinschaft (436 LET 17/4/98) and the Gatersleben Plant Genome Resource Center (PGRC).
Communicating editor: C. Haley
- Received January 20, 2000.
- Accepted August 24, 2000.
- Copyright © 2000 by the Genetics Society of America