A genetic linkage map was constructed in a backcross family of the red flour beetle, Tribolium castaneum, based largely on sequences from bacterial artificial chromosome (BAC) ends and untranslated regions from random cDNA's. In most cases, dimorphisms were detected using heteroduplex or single-strand conformational polymorphism analysis after specific PCR amplification. The map incorporates a total of 424 markers, including 190 BACs and 165 cDNA's, as well as 69 genes, transposon insertion sites, sequence-tagged sites, microsatellites, and amplified fragment-length polymorphisms. Mapped loci are distributed along 571 cM, spanning all 10 linkage groups at an average marker separation of 1.3 cM. This genetic map provides a framework for positional cloning and a scaffold for integration of the emerging physical map and genome sequence assembly. The map and corresponding sequences can be accessed through BeetleBase (http://www.bioinformatics.ksu.edu/BeetleBase/).
TRIBOLIUM castaneum is one of the model organisms among the higher eukaryotes with the most sophisticated genetics and is a member of the largest and most species diverse of all eukaryotic orders, the Coleoptera. The genome sequence of this insect is being produced at the Human Genome Sequencing Center, Baylor College of Medicine, and a draft assembly is available (Tcas_1.0, http://www.hgsc.bcm.tmc.edu/projects/tribolium/). We are in the process of constructing a physical map of the genome of T. castaneum based on similarity of HindIII digest fingerprints of 17,000 bacterial artificial chromosome (BAC) clones. The availability of an integrated physical and genetic recombination map will contribute to an improved genome sequence assembly. Previously we published a whole-genome genetic recombination map of T. castaneum based on 133 markers, most of which were random amplified polymorphic DNA (RAPD) fragments (Beeman and Brown 1999). Zhong et al. (2004) also published a whole-genome recombination map for T. castaneum, based on 269 amplified fragment-length polymorphism (AFLP) markers. In each of these studies only a few markers were correlated with specific sequences, limiting their usefulness. We therefore decided to create an improved map of higher marker density and based almost entirely on known sequences.
MATERIALS AND METHODS
The genetic map of T. castaneum presented herein is based on pooled data from 12 single-pair backcross populations derived from two highly inbred but unrelated wild strains, GA-2 and ab-2. These near-homozygous lines were prepared from the parental strains GA-1 and ab, respectively, by 20 consecutive generations of single-pair, full-sib mating (S. Thomson, personal communication). The North American GA-1 strain was collected in a farmer's corn bin in Georgia in 1980 (Haliscak and Beeman 1983), while the South American ab strain was collected near Bogota, Colombia in 1980 (Vasquez and del Castillo 1985).
Twelve single-pair, virgin crosses were made between ab-2 males and GA-2 females (Figure 1A). For each cross, one single-pair recurrent backcross was made between a virgin, F1 female and her ab-2 father, yielding a total of 12 backcross families (Figure 1B). An average of 15 progeny from each backcross (range 7–25) were selected randomly to compose the mapping family. This family consisted of a total of 179 progeny, including 144 females and 35 males. Because mapping was done in backcross families, only the nonrecurrent parental alleles (derived from the GA-2 strain) were scored. To provide a quantity of DNA sufficient for mapping thousands of loci, the backcross progeny were not directly subjected to single-beetle DNA extraction. Rather, each of the 179 progeny was expanded into a subfamily by making additional single-beetle, virgin backcrosses to the ab-2 parental type (2–5 ab-2 mates per backcross progeny, Figure 1C). Fifty mixed-sex progeny from each of the 179 backcrosses were bulked for DNA extraction (Figure 1D). Since all markers were derived from the original 12 GA-2 females, each bulked DNA sample was informative about the genotype of the backcross parent: if the backcross parent had been heterozygous for any particular GA-2-derived allele, this allele would still be present in the subsequent backcross subfamily, albeit at a frequency of ∼0.25 rather than the exact frequency of 0.5 in the parent. If the backcross parent had been homozygous for the corresponding ab-2-derived allele, this allele would still be fixed after the subsequent backcross. Thus, the frequency of each GA-2-derived allele would be either exactly 0 or ∼0.25 in any given bulked DNA from the second backcross.
DNA isolations were performed using the Wizard genomic DNA purification kit (Promega, Madison, WI) according to the manufacturer's protocol, with modifications. DNAs from individual backcross families were prepared from 50 beetles per family (two groups of 25). For each 25-beetle DNA extraction we used 600 μl of nuclei lysis solution and 200 μl of protein precipitation solution, followed by phenol/chloroform extraction and ethanol precipitation. For single-beetle DNA isolations we omitted the phenol/chloroform extraction and used 60 and 20 μl, respectively, of nuclei lysis and protein precipitation solutions. The two DNA preparations from each subfamily (50 total progeny) were pooled to constitute the mapping DNA representing each of the 179 beetles in the mapping family. DNA concentrations of the stock solutions ranged from 30 to 60 ng/μl (1–2 ml total volume) as determined fluorometrically using Hoechst 33342 as fluorophore and calf thymus DNA as standard. After preparing working dilutions of 10 ng/μl, the remainder of each stock solution was stored under mineral oil at −80°.
BAC-end fragment analysis:
End fragments were sequenced from 431 random BACs after inverse or universal PCR amplification. These included 393 sequenced from one (arbitrary) end and 38 from both ends after failure to detect a dimorphism at the first end. These 469 sequences were examined for dimorphisms (see below). Procedures for inverse PCR were adapted from Li et al. (1999) and Raponi et al. (2000). Procedures for universal PCR are given in Beeman and Stauth (1997). Approximately 550 additional BAC-end sequences, most of which were determined at the Institute for Genetics, University of Cologne (Cologne, Germany), were similarly examined for dimorphisms.
An oligo(dT)-primed cDNA library was prepared at the Institute for Genetics from mRNA covering every stage of Tribolium embryonic development (Wolff et al. 1995). Segments (including coding regions and 3′- and 5′-UTRs) from 622 expressed sequence tag (EST) contigs derived from this library were screened for dimorphisms via single-strand conformation polymorphism (SSCP) or heteroduplex analysis as described below. In addition, 111 transcription factor EST sequences derived from adult and larval cDNA libraries were obtained from Exelixis Pharmaceutical and similarly subjected to SSCP/heteroduplex analysis.
PCR products from BAC ends were either cloned (pCR4-TOPO, Invitrogen, San Diego) or purified for direct sequencing (QIAquick PCR purification kit, QIAGEN, Valencia, CA). These products were sequenced using an ABI 3700 DNA sequencer (Sequencing and Genotyping Facility, Plant Pathology, Kansas State University). cDNA clones were sequenced using the DYEnamic ET Terminator cycle sequencing kit (Amersham Biosciences) and run on a MegaBACE 1000 instrument (Institute for Genetics). Approximately 100 BACs were end sequenced at the Purdue University Agricultural Genomics Core Facility.
BAC names adhere to the Exelixis system of nomenclature.
PCR and SSCP:
From each sequence, specific primers 17–25 nt in length were designed to amplify fragments ∼150–400 nt in length. Primers were designed using Lasergene sequence analysis software (DNASTAR, Madison, WI). PCR typically consisted of 40 cycles of 94° (30 sec), appropriate annealing temperature (30 sec), and 72° (30 sec). PCR products from the two parental strains as well as from F1 hybrids were examined on agarose gels for length dimorphisms and on 4–20% gradient precast TBE PAGE gels (Invitrogen/Novex) for SSCPs and for resolution between heteroduplexes and homoduplexes. For SSCP/heteroduplex analysis, 4 μl of loading buffer (95% formamide, 20 mm EDTA, 0.05% bromophenol blue, 0.05% xylene cyanol) was added to 7 μl of PCR product, the mixture was denatured at 94° for 3 min and cooled immediately in ice water, and 4–10 μl was loaded onto PAGE gels. Electrophoresis was run at 250 V for 1.5–2.5 hr at 5° or 18°. Gels were poststained with ethidium bromide (0.1 μg/ml) and digitally recorded with a Nucleovision (NucleoTech, San Mateo, CA) image capture system.
AFLP-PCR was performed using the AFLP (small genome kit) or Analysis System (I or II) (Invitrogen) and 33P-end-labeled selective primers. Reactions were performed as recommended by the manufacturer except that all components in the reactions were scaled to one-half the recommended volume and quantity. Reaction products were separated by electrophoresis through 6% denaturing LongRanger (Cambrex) polyacrylamide gels. After drying, gels were autoradiographed using a Packard Bioscience Cyclone Storage Phosphor System (0.5- to 4-hr exposures to screens).
Linkage group assignment and ordering of loci was done with the aid of Map Manager QTXb15 (hereafter referred to as QTX) (Manly et al. 2001; http://www.mapmanager.org/mmQTX.html) followed by manual adjustment to give the most parsimonious result (fewest double crossovers). We found that QTX gave accurate linkage group assignments, but often failed to correctly order those loci with incomplete genotype scores, i.e., for which not all progeny genotypes could be ascertained. This is because QTX places markers to achieve the greatest increase in the sum of LOD scores for linkage. Since a poor marker (i.e., one with many unscorable genotypes) will have a lower LOD score when paired with any other marker, QTX avoids placing the poor marker among a group of closely linked, good markers, even if doing so would avoid creating a suspicious double crossover. The critical value for linkage detection was set at P = 0.000001 (0.0001% probability that the loci would appear linked if they were not). Recombination frequencies were converted to map distances using the QTX's Morgan function, which assumes complete interference, i.e., no double-crossover events between markers.
Sequence analysis and database comparisons:
DNA sequences were subjected to similarity searches using the BLASTX program (Altschul et al. 1997) with the BLOSUM62 matrix in conjunction with the following settings: DNA vs. nonredundant database (National Center for Biotechnology Information), no repeat masking, low-complexity filtering. The cutoff threshold for detection of sequence similarity was set at e = 10−6.
RESULTS AND DISCUSSION
Of ∼2000 specific-sequence and 100 AFLP loci screened, a total of 424 (400 specific sequence and 24 AFLP) showed clear dimorphisms between the parental strains and segregated unambiguously in the backcross. The allele frequency of the nonrecurrent allele (0.58 ± 0.05 for all 424 loci) (range 0.41–0.72) showed a significant deviation from the predicted 1:1 segregation ratio, being biased toward overrepresentation of the GA-2 alleles compared to those derived from the ab-2 parent. This could reflect a general reduction in vigor expected for any highly inbred strain. Since the backcross was to the ab-2 parent, heterozygotes (inheriting the GA-2-derived allele) would be more fit than ab-2 homozygotes for many loci. Linkage group assignments, primer sequences, and accession numbers of all 424 loci are given in supplemental Table S1 at http://www.genetics.org/supplemental/.
With the few exceptions noted below, all 12 single-pair backcrosses segregated a single pair of alleles at each of the 424 loci analyzed, confirming the near-homozygous inbred condition of the parental strains. Only 9 (2%) of the 424 loci showing interstrain dimorphisms were dimorphic or polymorphic within one or the other of the inbred strains. In each case the unusual alleles were confined to particular mapping families, and mapping could be accomplished in the remaining families. For 2 of the 9 loci, the atypical alleles were derived from the ab-2 parent. In both cases one of the ab-2 parents carried the GA-2 allele, and the F1 beetle proved to be homozygous. Thus, the affected family had to be disregarded. For 6 other loci the atypical alleles were derived from the GA-2 parent. In 3 of these 6 cases the atypical alleles were indistinguishable from the corresponding ab-2 allele, and the affected families were disregarded. For the other 3 loci the GA-2-derived atypical alleles were resolvable from the corresponding ab-2 alleles. Thus, for these loci data representing all families were pooled prior to importation into QTX and construction of the linkage map (Figure 2), and no information was lost. For the ninth locus, there was a heteroduplex in hybrids derived from all but one of the single-pair crosses, but the parental DNAs were not tested further to indicate which parent carried the atypical allele.
The 2000 loci screened included ∼600 random embryonic cDNA's as well as 100 transcription factor EST sequences from a mixture of life stages. Of these, 146 of the former and 19 of the latter had dimorphisms of sufficient quality for successful mapping. Of ∼1000 BAC-end sequences screened, 250 had detectable dimorphisms and 190 were mapped. The remaining 300 loci screened included genes, microsatellites, sequence-tagged site (STS) markers, transposon insertion junctions, and AFLPs, of which 69 were mapped.
In almost all cases, map positions could be assigned with precision and confidence by the two-step approach (preliminary mapping by QTX, followed by manual fine tuning). The “hide locus” function was used to test whether any markers caused map expansion of >3.0 cM. Only three such markers were found, but upon rechecking, the scoring appeared to be accurate. Two of these (2D10 and 25.A01s) are isolated at one far end of a linkage group (LG10 and LG3, respectively) >3 cM distant from the nearest marker. The third locus (12F01 on the X chromosome) is the sole marker in a 21-cM interval, and exclusion of the locus would mask an apparently legitimate double-crossover event in this interval. Double crossovers within short intervals can be suggestive of genotyping errors. A few loci could not be positioned without introducing one or more suspicious double recombinants, defined as two crossover events flanking a single locus. However, upon reexamination, each of these proved to have been mis-scored, with the exception of 12F01.
A total of 1783 chromosomes and 68,688 individual genotype assignments were analyzed. An average of 162 scorable individuals per locus for the 424 mapped loci was observed. After omitting double crossovers near the origin of LG4 (position 0.0), we identified a total of only 10 double crossovers in intervals of ≤15 cM. A few of these could represent genotyping errors, but most are undoubtedly authentic, as evidenced by the presence of multiple corroborating loci in each of the 3 segments defined by each pair of crossovers. In addition to these 10 widely scattered segments, there were 10 double crossovers in intervals of ≤15 cM near the origin of LG4. Nine of these involved recombination between a pair of tightly linked markers at position 0.0 on LG4 and the rest of the linkage group. This could represent a recombination hotspot.
Nonexchange chromosomes accounted for fully 48% of the 1783 examined, while 46% had a single crossover. Altogether, only 98 of the 1783 chromosomes were derived from double-crossover events, the large majority in regions >20 cM. Only 6 of the 1783 chromosomes had more than two crossovers, and none had more than four. This percentage drops to 1.5% if the most recombinogenic chromosomes (3, 4, and 9) are excluded. Chromosome 3, which is approximately twice as recombinogenic as all others, is also approximately twice as long as the others when viewed in metaphase (Mocelin and Stuart 1995). The mean number of crossovers per chromosome was 0.57, ranging from 0.45 for LGX to 1.02 for LG3. This analysis does not address the actual rate of chiasma formation during meiosis, which is expected to be higher than the observed crossover frequency because of nonrecombinogenic sister chromatid exchange.
The mean LOD score for all adjacent pairs of loci on all linkage groups was 40.8 (42.6 if AFLPs are excluded). If the X linkage group and all 24 AFLP loci are excluded, no gap had a LOD score <19.5 or a genetic distance >10.1 cM. The largest gap on the sparsely marked X had a LOD score of 17.5 and a genetic distance of 16.4 cM. The AFLP markers were difficult to score (many unscorable progeny) and thus gave the lowest LOD scores (mean of 15.2, range of 8.3–29.0 for the 24 loci).
Linkage group correlation:
Seven of the 10 linkage groups could be correlated with those defined in previous studies (Sokoloff 1977; Beeman et al. 1996; Beeman and Brown 1999). Linkage groups 1 = X, 2–5, 7, and 9 correspond to similarly numbered groups in those works. Of these 7 linkage groups, 3 have been oriented with respect to corresponding groups in the previous studies, on the basis of common markers near one terminus of each (X63, eve, and hairy, for linkage groups 3, 7, and 9, respectively). Linkage groups 6, 8, and 10 in the current work could not be individually correlated with similarly numbered groups in earlier works. The numbering system presented here supersedes previous versions.
BLASTX analysis resulted in significant matches (≤e-06) for 127 of the 165 mapped ESTs and 22 of the 190 mapped BAC-end sequences (see supplemental Table S2 at http://www.genetics.org/supplemental/). Matches include ribosomal proteins; transcription factors; protein kinases; signal transducers; a chemosensory protein; enzymes involved in electron transport, glycolysis, or other pathways; and others. These will be described in more detail in a subsequent report.
Genetic maps for the Coleoptera:
The total genetic distance spanned in this map agrees very well with the previously determined values of 570 cM (Beeman and Brown 1999) and 573 cM (Zhong et al. 2004). More than 100 T. castaneum genes are newly identified and mapped, and genomic DNA and cloned RAPD fragments were also sequenced to generate STSs. Our map is the first significant gene expression map in a beetle. It is also the densest genetic map available for any beetle and the first to be backed up comprehensively by clone sequence data.
We are aware of three other published efforts to create genome-scale recombination maps in beetles. The first involved the Colorado potato beetle, Leptinotarsa decemlineata, and encompassed 172 AFLPs and 10 other markers scattered over 18 linkage groups at a mean intermarker distance of 11.1 cM (Hawthorne 2001). The second involved the lesser grain borer, Rhyzopertha dominica, and included 94 random amplified DNA fingerprint markers defining 9 linkage groups at a mean intermarker distance of 4.6 cM (Schlipalius et al. 2002). The third involved the confused flour beetle, T. confusum, and included 137 RAPD markers spanning 8 linkage groups at a mean intermarker distance of 7.1 cM (Yezerski et al. 2003).
The present map differs with previous maps of beetles in several important respects. In the present investigation, we wanted to secure sufficient DNA representing each of the 179 progeny to enable us to potentially map many thousands of loci using the same set of mapping DNAs. This was made possible by the extreme inbreeding T. castaneum can tolerate and the inbred strains that were available. Thus, it was possible to expand the backcross progeny into subfamilies by making additional, single-beetle, virgin backcrosses to the ab-2 parental type as described above. Although this method did result in a much larger amount of DNA representing each individual in the mapping population, it came at the cost of an ∼50% reduction in the frequency of the allele to be scored and the introduction of a statistical uncertainty about the exact value of this frequency. Nonetheless, this modification did not interfere with the success of the mapping procedure. An alternative method for increasing the amounts of available backcross DNA has recently been developed. It involves whole-genome amplification, based on use of the strand-displacement polymerase φ19 (Dean et al. 2001). Since copy-number ratios are well preserved after whole-genome amplification, the method is suitable for high-resolution genetic polymorphism analysis (Gorrochotegui-Escalante and Black 2003).
Significance of the Tribolium map:
An important application of the genetic map described here will be its integration with the emerging physical map and genome sequence. The physical map will be based on comparison of restriction digest fingerprints of BAC clones and is currently under development in our laboratories. Integration of these three components will be critically important for linking, orienting, and ordering the genome sequence scaffolds, eventually leading to a fully integrated sequence-genetic-physical map of each of the 11 chromosomes (9 autosomes, X, and Y).
We thank M. S. Thomson for providing the inbred lines and Exelixis Pharmaceutical for providing transcription factor EST sequences and the Tribolium BAC library. We also thank Kathy Leonard, Barb van Slyke, and Christine Rotunno for technical assistance. This work was supported in part by the Human Frontier Science Program, the National Institutes of Health, the Kansas State Agricultural Experiment Station, and the Agricultural Research Service. L.R.C. was supported by the National Institutes of Health and funding from the Merck-Merial Veterinary Research Grants Program. This article is contribution no. 05-156-J from the Kansas Agricultural Experiment Station.
Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos. CZ012504, CZ012684, CB334789, CB337233, CO049327, CO049328, CO049329, CO049330, CO049331, CO049332, CO049333, CO049334, CO049335, CO049336, CO049337, CO049338, CO049339, CO049340, CO049341, CO049342, CO049343, CO049344, CO049345, and AY920532, AY920533, AY920534, AY920535, AY920536, AY920537, AY920538, AY920539, AY920540, AY920541, AY920542, AY920543, AY920544.
Communicating editor: M. Justice
- Received June 29, 2004.
- Accepted March 9, 2005.
- Genetics Society of America