A new genetic map of maize, ISU–IBM Map4, that integrates 2029 existing markers with 1329 new indel polymorphism (IDP) markers has been developed using intermated recombinant inbred lines (IRILs) from the intermated B73 × Mo17 (IBM) population. The website http://magi.plantgenomics.iastate.edu provides access to IDP primer sequences, sequences from which IDP primers were designed, optimized marker-specific PCR conditions, and polymorphism data for all IDP markers. This new gene-based genetic map will facilitate a wide variety of genetic and genomic research projects, including map-based genome sequencing and gene cloning. The mosaic structures of the genomes of 91 IRILs, an important resource for identifying and mapping QTL and eQTL, were defined. Analyses of segregation data associated with markers genotyped in three B73/Mo17-derived mapping populations (F2, Syn5, and IBM) demonstrate that allele frequencies were significantly altered during the development of the IBM IRILs. The observations that two segregation distortion regions overlap with maize flowering-time QTL suggest that the altered allele frequencies were a consequence of inadvertent selection. Detection of two-locus gamete disequilibrium provides another means to extract functional genomic data from well-characterized plant RILs.
GENETIC maps facilitate both basic and applied research. To help build a high-resolution maize genetic map, Lee et al. (2002) developed the intermated B73 × Mo17 (IBM) population by randomly intermating an F2 population derived from the single cross of the inbreds B73 and Mo17 for several generations prior to extraction of intermated recombinant inbred lines (IRILs). The resolution in the resulting mapping population was greatly enhanced because additional opportunities for recombination were provided during the multiple generations of the intermating process (Lee et al. 2002; Winkler et al. 2003). After genotyping these IRILs with 2046 markers, the Maize Mapping Project (MMP) constructed a genetic map (IBM2) that contains 2026 markers (Coe et al. 2002; Cone et al. 2002). Fewer than 60% (1161) of these markers are sequence defined.
Additional genetic markers based on gene sequences would (1) provide additional links to the sequenced rice genome and thereby facilitate comparative cereal genome studies, candidate gene cloning efforts, and the assignment of functions to genes via mapping quantitative trait loci (QTL); (2) better integrate the maize genetic and physical maps for use in the genome sequencing project; (3) enhance our understanding of genetic recombination, genome structure, and evolution; and (4) provide additional markers for marker-assisted selection (MAS) during conventional breeding projects.
The number of maize expressed sequence tags (ESTs) has grown to more than half a million and the number of gene-enriched maize genome survey sequences (GSSs) that have been deposited in GenBank has grown to >1 million. We used these genic sequences to develop >1300 indel polymorphism (IDPs) markers that detect polymorphisms in 3′-UTRs and introns. These markers were then mapped using a panel of 91 IRILs from the IBM population.
Various types of DNA polymorphisms that occur in or near genic sequences can be used as genetic markers. These include RFLPs, single strand conformational polymorphisms (SSCPs), simple sequence repeats (SSRs), and single nucleotide polymorphisms (SNPs) (Liu and Corder 2004). The nature of the detection methods (RFLPs and SSCPs), the low frequency of SSRs (1.5%) in ESTs (Kantety et al. 2002), and the requirement for sequence information (SNPs) can limit the high-throughput application of these marker systems.
In contrast, IDPs can be detected using high-throughput technologies, they occur at high frequencies, and the actual sequences of the polymorphisms underlying the IDPs need not be identified prior to mapping. To map IDPs, PCR primer pairs designed on the basis of genic sequences are used to survey the parents of a mapping population and those primer pairs that yield PCR products with size or presence/absence polymorphisms can be efficiently mapped. Several earlier studies detected small IDPs using sequencing gel electrophoresis (Cato et al. 2001; Bhattramakki et al. 2002; Choi et al. 2004). This study, like that of Choi et al. (2004), focused on polymorphisms that could be detected via agarose gel electrophoresis. Hence, the resulting IDPs are appropriate for a variety of research settings and are suitable for routine and high-throughput use in both basic research and applied breeding.
In addition to their use in the development of genetic maps, recombinant inbred lines (RILs) can also be used to detect genomewide two-locus allele associations (Williams et al. 2001). We report the genomic structures of IRILs from the IBM population and illustrate how these data can be used to develop functional genomic hypotheses.
MATERIALS AND METHODS
Seeds of B73, Mo17, and IRILs extracted from the IBM Syn4 population were provided by Mike Lee (Lee et al. 2002). The IRILs used in this study were from the F7:9 generation. Of the 94 core IRILs selected by the Missouri MMP, 91 were used for mapping. At the beginning of this project DNA was not available for IRIL M044 and the corresponding well in microtiter plates was used as a negative control for PCR. Our stocks of 2 of the 94 core IRILs, M0062 and M0383, were found to contain high rates of heterozygosity (data not shown) and were therefore excluded from subsequent analyses. The remaining 91 IRILs exhibited high independence (supplemental Figure 1 at http://www.genetics.org/supplemental/). We had available a total of 297 IBM IRILs. Of these, 242 are in common with the 302 IRILs used by the Missouri Mapping Project. To evaluate the effects of using a larger panel of IRILs, we genotyped this subset of 242 IRILs using the markers mapped on chromosome 3. Seeds of 22 inbred lines (B109, Pa91, Va35, B84, N194, H84, N801w, N209, B113, B77, NC314, Mo44, B104, NC264, N7A, Oh43, R177, N28Ht, N215, C123, SD46, and Va85) provided by Ken Russell (University of Nebraska) were used for a polymorphism survey reported on the project website. For all maize lines, DNA samples were isolated from 2-week-old seedling leaves using our high-throughput protocol (Dietrich et al. 2002).
Primer design, PCR, and genotyping IRILs:
Primer3 (Rozen and Skaletsky 2000) was adapted to design primers in batch mode and to generate amplicons of 250–800 bp in size. Approximately 80% of the IDP marker primer pairs were derived from ESTs and the remainder from genic genomic sequences (Figure 1A).
Maize 3′ ESTs generated by us (Fu et al. 2005) or downloaded from GenBank and those contained with poly(T) prefixes were used to design primers that would amplify 3′-UTRs. On the basis of a survey of the average lengths of maize 3′-UTRs (data not shown), primers were designed to amplify a region ∼300 bp upstream of the poly(A) site (Figure 1A). Intron-spanning primers (Figure 1A) were designed on the basis of existing-structure-known genes, GSSs downloaded from GenBank, and the sequences of maize assembled genomic islands [MAGIs; MAGI ver. 2.3 (Emrich et al. 2004) and ver. 3.1 (Fu et al. 2005)]. Gene models were determined on the basis of GeneSeqer-facilitated (Brendel et al. 2004) alignments between genomic and EST sequences or predicted using FGENESH (http://www.softberry.com) as described (Yao et al. 2005). After removing a few redundant primer pairs, the sequences for the remaining 13,924 primer pairs, source sequences from which primer pairs were designed, and polymorphism data are presented in supplemental Table 1 at http://www.genetics.org/supplemental/. Because it is not possible to accurately determine the expected sizes of genomic PCR products for 3′-UTR primers, calculations that rely on expected sizes of PCR products are confined to intron-spanning primers.
To compare the structures of our IRILs to those used by the Missouri MMP, we genotyped our IRILs with 18 SSRs developed by the Missouri MMP. Primer pairs designed by the Missouri MMP to amplify the 18 SSR markers (umc1252, umc1604, umc1404, umc1608, umc1943, umc1999, umc2035, umc2036, umc1143, umc1350, umc1672, umc1708, umc1268, umc1592, umc1120, umc1505, umc1995, and umc2069) were used for genotyping (supplemental Table 2 at http://www.genetics.org/supplemental/). Primer sequences for these SSR markers are available at MaizeGDB (http://www.maizegdb.org).
The initial 20-μl PCR reactions included ∼20 ng B73 or Mo17 genomic DNA, 2 μl 10× PCR buffer, 2 μl 2 mm dNTP, 0.8 μl 50 mm MgCl2, 2 μl 5 μm forward primer, 2 μl 5 μm reverse primer, 1 unit Taq polymerase) These PCR reactions were incubated for 3 min at 94°, followed by 30 cycles of 94° for 30 sec, 60° for 45 sec, and 72° for 90 sec with a final 10 min at 72° and analyzed via 1% agarose gel electrophoresis. Primer pairs that detected polymorphisms between B73 and Mo17 by the survey were subjected to temperature-gradient PCR to confirm polymorphisms and to identify improved annealing temperatures (supplemental Table 3 at http://www.genetics.org/supplemental/), which were then used to genotype the IBM IRILs.
Estimating quality of genotyping scores and error correction:
The maximum error rate in genotyping scores of the IRILs was estimated by designing two primer pairs for the same gene and comparing the resulting genotyping scores for each IRIL. This was performed for 52 loci. The overall rate of disagreement was <1%.
The quality of the IDP genotyping scores was further increased via analyses of apparent “double crossovers” (DCOs, i.e., BxMxB or MxBxM genotypes) in a preliminary build of the genetic map. Apparent DCOs can arise via actual DCOs, genetically linked crossovers that arose in different generations, or via genotyping or coding errors. The raw mapping scores (photographs of gels) for all marker/IRIL combinations associated with apparent DCOs were reexamined and errors were corrected prior to generating ISU–IBM Map4.
Sequencing indel polymorphisms:
To investigate the nature of IDPs, 96 pairs of IDP primers that detected size polymorphisms between B73 and Mo17 genomic DNA were purified using QIAquick spin columns and then sequenced from both directions. Sequences were deposited into GenBank.
Construction of genetic maps:
Mapping scores of the 1299 IDP markers and another 46 PCR-based markers developed by the Schnable lab for the 91 IRILs are available at the project website (http://magi.plantgenomics.iastate.edu) and MaizeGDB; mapping scores for the MMP markers were downloaded from the Missouri MMP website (http://www.maizemap.org). The genotyping scores of 91 IRILs were analyzed using the MultiPoint mapping software package (http://www.multiqtl.com). The approach of multilocus ordering implemented in MultiPoint employs evolutionary algorithms of discrete optimization, which uses the minimization of the total map length as the mapping criterion (Mester et al. 2003, 2004). With “RIL-selfing” as the population type, the initial clustering of all markers into linkage groups was based on a preset threshold recombination rate (rt = 0.15). The stability of the marker order obtained for each linkage group was tested using 100 resampling (jackknife) runs, allowing those markers that caused local neighborhood instability in the map to be detected and removed (Mester et al. 2003, 2004). This procedure was iteratively used with final verification based on 1000 jackknife runs until a stable ordering of markers (termed “skeleton” markers) was obtained. By relaxing rt, initial linkage groups could be further merged into 10 linkage groups/chromosomes where markers were reordered by repeating the above-mentioned steps until the final skeleton map was obtained. Markers with unstable local ordering in linkages were termed “muscle” markers. Although muscle markers do not have exact locations on the map, their approximate positions are correct with a high degree of certainty. Muscle markers are displayed relative to their closest skeleton markers using CMap (http://www.gmod.org/CMap) (Figure 2). The centromere position on each of the 10 chromosome maps was estimated on the basis of the IBM2 2004 Neighbors map from MaizeGDB. Centimorgan distances were calculated using Kosambi's function (Kosambi 1944) and then corrected using IRILmap software (Falque 2005), which is based on the formula specified for IRILs (Winkler et al. 2003).
Analyses using genetic map and mapping scores:
The genetic map positions of the 857 landmarks and genotyping data for each IRIL were analyzed using (1) CheckMatrix software (http://www.atgc.org) to visualize the mosaic structures of the 91 IRIL genomes and (2) HAPLOVIEW software (Barrett et al. 2005) to measure two-locus gametic disequilibrium (GD), i.e., significant deviations from the expected Hardy–Weinberg two-locus equilibrium (Gupta et al. 2005) using standardized disequilibrium coefficients (D′) (Hedrick 1987) and squared allele-frequency correlations (r2) (Gupta et al. 2005). A PERL script was written to calculate the similarity of each pair of IRILs using mapping scores excluding missing data points. The q-values for segregation distortion data were calculated according to Storey and Tibshirani (2003).
Identification of PCR-based IDPs between B73 and Mo17:
Primers that amplify 3′-UTRs and intron regions were designed to identify codominant (i.e., size) and dominant (i.e., presence/absence) polymorphisms that could be detected via agarose gel electrophoresis (materials and methods). The 1287 primer pairs that detected 1299 polymorphisms were assigned IDP numbers and are presented in supplemental Table 3 at http://www.genetics.org/supplemental/. Polymorphisms were classified as illustrated in Figure 1. Primers designed to amplify 3′-UTRs and introns exhibit similar rates of polymorphisms (10.8% vs. 10.4%, Figure 1A), but the intron-spanning primers yield a significantly higher rate (1.4-fold) of codominant polymorphisms than primers derived from 3′-UTRs (4.4% vs. 3.2%; P < 0.01). Approximately 10% (130/1275) of primer pairs that amplify a polymorphic band also amplify at least one additional nonpolymorphic band (types II and III).
IDPs ≥100 bp are associated with miniature inverted-repeat transposable elements:
To investigate the nature of the polymorphisms detected by the IDP primers, 96 pairs of randomly sampled PCR products that exhibited codominant size polymorphisms between B73 and Mo17 were sequenced. The lengths and sequences of 42 IDPs were determined unambiguously by sequence comparison of PCR products from both inbreds. More than half (15/27) of the IDPs that were <100 bp consisted of SSRs (Table 1). Most (80%) of the 15 IDP primers that exhibited polymorphism of at least 100 bp detected the presence/absence of annotated or predicted miniature inverted-repeat transposable elements (MITEs) from the The Institute for Genome Research maize repeat database 4.0 (http://maize.tigr.org). The rates at which MITEs were detected in 3′-UTRs (7/9) and introns (5/6) were similar.
A genetic map of maize transcripts:
A total of 1345 PCR-based markers (1299 IDP markers plus another 46 PCR-based markers developed by the Schnable lab) were used to genotype 91 IRILs (materials and methods). The resulting genotyping scores contain few errors (<1%) on the basis of conservative quality checks (materials and methods). These mapping scores, along with mapping scores from the 2046 existing MMP markers, were analyzed using MultiPoint software to construct a genetic map (materials and methods). Almost all of the IDP (98.8%, 1329/1345) and MMP (99.2%, 2029/2046) markers were successfully placed on the resulting genetic map. This new map, the ISU–IBM Map4, which consists of a total of 1788 cM (Table 2 for summary statistics and Figure 2 for a summary of the map), is available at http://magi.plantgenomics.iastate.edu. This website also provides other relevant data, including the gene sequences from which the IDP primers were designed, the sequences of the IDP primers, the PCR conditions used for mapping, and a photograph of the gel used to select optimal PCR conditions (see Figure 3 for an example and supplemental Table 4 at http://www.genetics.org/supplemental/ for all marker information). Because all of the new IDP markers were derived from either ESTs or predicted genes, this map provides the positions of >1000 newly mapped genes. Markers that exhibited stable ordering in the jackknifing experiments (materials and methods) were termed skeleton markers. Of the 3358 markers on ISU–IBM Map4, 1274 (38%) are skeleton markers of which ∼51% (651) are MMP markers and ∼49% (623) are IDP markers. The 1274 skeleton markers define 857 unique positions (landmarks) used for map length accumulation and calculation (Table 2). Of the landmarks, 393 (46%) are exclusively MMP markers, 355 (41%) are exclusively IDP markers, and 109 (13%) consist of colocalized IDP and MMP markers. These results indicate that MMP and our gene-based IDP markers are distributed in a complementary fashion along maize chromosomes.
Because the IRILs are not completely inbred and the maize genome is highly dynamic, different seed sources of the “same” IRIL can differ. The genotyping scores of 18 SSRs from our IRILs were 96.5% identical to those obtained by the Missouri MMP (supplemental Table 2 at http://www.genetics.org/supplemental/). Even so, these results suggest that the IRILs used by the two projects exhibit at least minor differences in genotypes.
Markers that exhibited unstable ordering in the jackknifing experiments (materials and methods) were termed muscle markers. Although the approximate positions of muscle markers can be determined with confidence, their exact positions relative to nearby markers cannot. Of the 3358 markers on the map, 2084 (62%) are muscle markers. To test whether the instability of markers in the jackknifing experiments might be a consequence of combining mapping scores from highly similar but not identical IRILs, we generated an MMP-only map (MMP–IBM Map4) and an IDP-only map (IDP–IBM Map4) and compared both to the combined map (ISU–IBM Map4) using CMap. Although no major conflicts of marker ordering were observed among the maps, 379 (28%) MMP muscle markers and 274 (39%) IDP muscle markers from the combined map became skeleton markers in the MMP–IBM Map4 or IDP–IBM Map4, respectively.
The average length of all 847 intervals between two adjacent landmarks (857 landmarks, 10 chromosomes) is ∼2 cM (1788/847). Approximately 95% (803/847) of the intervals are ≤5 cM (supplemental Figure 2 at http://www.genetics.org/supplemental/). As shown in Table 2, the largest intervals for individual chromosomes range from 6 cM (chromosomes 3 and 6) to 13 cM (chromosome 5).
As shown in Figure 2, 56 muscle markers cluster around the centromere of chromosome 7 on the ISU–IBM Map4. The finding that many IDP and MMP markers cluster around the centromeric regions of chromosomes 2, 3, 4, 5, 6, 7, 9, and 10, similar to the observation by Falque et al. (2005), is consistent with the hypothesis that maize exhibits a suppression of recombination in centromeric regions, as is true in rice (Wu et al. 2003). In contrast, chromosomes 1 and 8 do not have many markers clustered around the presumed positions of the centromeres. This could be due to low rates of polymorphisms between B73 and Mo17 in the regions surrounding these centromeres or to inaccurate positioning of the centromeres on these chromosomes.
Impacts of using a larger panel of IRILs:
The ISU–IBM Map4 was prepared using a panel of 91 IRILs. To determine the effects of using a larger panel of IRILs, 242 IRILs (materials and methods) were genotyped using all the IDP markers located on chromosome 3. A new genetic map (Chr.3-IDP+MMP; supplemental Figure 3 at http://www.genetics.org/supplemental/) was constructed using these data and existing genotyping data for MMP markers for these 242 IRILs. In the ISU–IBM Map4, chromosome 3 contains 153 skeleton and 247 muscle markers (Table 2). In contrast, the Chr.3-IDP+MMP map contains somewhat more (180) skeleton and somewhat fewer (196) muscle markers. Only a single skeleton marker (umc2118) changed order between the two maps and this marker exhibited a simple reversal with a nearby (0.7 cM for ISU–IBM Map4) skeleton marker, IDP125 (supplemental Figure 3 at http://www.genetics.org/supplemental/). Hence, as expected, using a larger panel of IRILs did not have a significant effect on the ordering of skeleton markers, but did increase somewhat the proportion of markers that exhibited stable ordering and that were therefore designated skeleton markers.
Structures of IRIL genomes:
A total of 5210 recombination crossovers were identified using the mapping scores of the 857 landmarks. Therefore, individual IRILs contain an average of 57 crossovers. These data provide detailed structures of the chromosomes carried by each of the IRILs. Surprisingly, in IRIL M0337 and M0054 genomes, all the landmarks on chromosomes 6 and 8 are derived from Mo17 (supplemental Figure 4 at http://www.genetics.org/supplemental/). When using mapping scores of all markers on chromosome 6 for IRIL M0337 without missing data, 100% (165/165) of MMP markers and 99% (126/127) of IDP markers are derived from Mo17; for chromosome 8 in IRIL 0054, 97% (146/150) of MMP markers and 100% (93/93) of IDP markers are derived from Mo17. These unusual chromosome structures presumably arose during the random-mating phase of IRIL development.
The numbers of crossovers on individual chromosomes range from 360 (chromosome 10) to 825 (chromosome 1). The numbers of crossovers were normalized using the cytological lengths of chromosomes (Table 3), and chromosome 6 has the smallest number of crossovers per cytological distance. This is consistent with the observation that chromosome 6 has the fewest number of synaptonemal complexes and recombination nodules on average (Anderson et al. 2003), which may be due to the presence of the nucleolus organizer on the short arm of this chromosome (Phillips et al. 1971).
In the absence of gametophyte competition and selection during random mating and inbreeding, the ratio of alleles from each inbred parent (B73:Mo17) for a given marker (locus) across the 91 IRILs would not be expected to deviate significantly from 1:1. A χ2 goodness-of-fit test was used to test this hypothesis for each locus. Because multiple tests (857 markers) were performed, q-values (Storey and Tibshirani 2003) were used to control the false-discovery rate. Approximately 32% (277/857) of the skeleton markers exhibit significant allele segregation distortion (q < 0.05) among all IRILs (Table 4). Nearly all distorted markers on chromosomes 4, 7, and 10 are skewed toward B73, while all distorted markers on chromosome 8 are skewed toward Mo17. In addition, on chromosomes 1, 2, and 5, more distorted markers are skewed toward B73 than toward Mo17. In contrast, on chromosomes 3 and 9, more distorted markers are skewed toward Mo17 than toward B73 (Table 4). The three chromosome 3 segregation distortion regions (SDRs) detected by analyzing the 91 IRILs (Table 4) are still skewed toward Mo17 when the larger panel of 242 IRILs is analyzed (data not shown).
Allele frequencies changed during random mating or inbreeding:
Allele segregation data of 139 markers used to genotype the B73 × Mo17 F2, Syn5 population (Lee et al. 2002), and the 91-IBM population were compared (Table 5). Only 2 of 11 markers (18%) that exhibited highly significant deviations (P < 0.01) in the F2 population exhibited significant deviations in the Syn5 population, while 25 of 128 markers (20%) that did not exhibit significant deviations in the F2 population exhibited significant deviations in the Syn5 population (Table 5A). Moreover, ∼30% of markers (8/27) that exhibited highly significant deviations in the Syn5 population exhibited significant deviations in the IBM population and ∼15% of markers (17/112) that did not exhibit significant deviations in the Syn5 population exhibited significant deviations in the IBM population (Table 5B). The observation that most markers showing segregation distortion in the F2 but not in the Syn5 population suggests that the segregation distortions in F2 could be generated by chance. The more extensive segregation distortion in the Syn5 population or the IBM IRILs as opposed to the F2 population is consistent with a previous report (Lu et al. 2002). These findings demonstrate that allele frequencies changed significantly during the development of the IRILs.
Two SDRs overlap with maize flowering-time QTL:
An SDR was declared where more than two flanking skeleton markers exhibited statistically significant segregation distortion (Lu et al. 2002). According to this definition, 16 SDRs skewing toward B73 and 10 SDRs skewing toward Mo17 were identified (q < 0.05; Table 4). At least 2 of these SDRs (Figure 4) overlap with two of five maize flowering-time QTL detected via a meta-analysis (Chardon et al. 2004). One SDR located between IDP1407 and IDP4052 (117.3–128.3 cM) on chromosome 1 overlaps with a QTL (between umc67a and umc1590) that has been reported to be associated with flowering-time traits such as silking date and days to pollen shed, as well as with other traits such as plant height and leaf number (Chardon et al. 2004). Similarly, the other SDR located between phi100175 and IDP2146 (60.6–93 cM) on chromosome 8 overlaps with the well-known vgt1 major QTL near umc1316, which is involved in floral transition (Vladutu et al. 1999; Salvi et al. 2002; Chardon et al. 2004). The observation that two SDRs that exhibit segregation distortion in the Syn5 population, but not in the F2 population, colocalize with maize flowering-time QTL suggests that inadvertent selection for flowering time occurred during the generation of the IBM IRILs.
Genotyping data and the high-density genetic map can be used to measure genomewide two-locus GDs in the IBM population represented by the 91 IRILs (materials and methods). To establish appropriate cutoff values, two-locus GDs for all marker pairs with distances ≤50 cM (intrachromosome) were measured (supplemental Figure 5 at http://www.genetics.org/supplemental/). When the genetic distance between two markers is >30 cM, all values of r2 are <0.2 and all but one of the D′'s are <0.6 (supplemental Figure 5 at http://www.genetics.org/supplemental/). Using these values (D′ ≥ 0.6 and r2 ≥ 0.2) as cutoffs, four instances of two-locus interchromosomal GDs were identified (Table 6), each of which had a LOD value of >7. Even using the highly stringent Bonferroni correction for multiple tests, all four of these two-locus GDs are significant (P < 0.001). Genes closely linked to these pairs of markers may exhibit epistatic interactions responsible for the observed GD. Hence, this type of analysis potentially provides another means to extract functional genomic data from well-characterized plant RILs. On the other hand, interchromosome two-locus GD can result from either epistatic selection or random allele fixation during IRIL development (Williams et al. 2001; Gupta et al. 2005). The ongoing maize genome sequencing project will provide data needed to distinguish between these possibilities.
An enhanced genetic map of maize:
A total of 1329 new gene-based IDP markers and 2029 previously developed markers were placed on a new maize genetic map (ISU–IBM Map4). The PCR-based IDP markers that detect indel polymorphisms in maize genes are suitable for high-throughput analyses. Unlike SNPs, the detection of IDPs requires access only to inexpensive, and widely available, PCR and agarose gel electrophoresis technologies. Hence, IDPs are suitable for routine use in most maize genetics laboratories. It is possible to use these markers to conduct MAS programs, construct the specific genotypes required for quantitative genetic studies, facilitate double-mutant analyses and suppressor/enhancer screens, and map QTL. In species with fully sequenced genomes it is often possible to identify candidate genes associated with mapped mutants and QTL. Although a full maize genome sequence is not yet available, the maize genes genetically mapped during this study provide sequence-based crosslinks that will facilitate the alignment of the rice physical map with the maize genetic map and thereby increase the efficiency of candidate gene cloning projects in maize. Subsequent to the release of the ISU–IBM Map4 in March 2005, another group has mapped an additional 954 cDNA-based markers using 94 IRILs (Falque et al. 2005).
PCR-based polymorphisms between B73 and Mo17 genomes:
A total of 130 primers that detected a polymorphism between B73 and Mo17 also amplified at least one nonpolymorphic fragment from B73 and Mo17 (type II and III primers, Figure 1, B–D). These additional PCR products probably arise via the amplification of paralogous sequences. The threefold higher rate at which this occurs among intron-spanning primers (∼23%; 63/271) as compared to 3′-UTR primers (7%; 67/1004) probably reflects the higher degree of sequence conservation among paralogs in exons as compared with 3′-UTRs. The amplification of paralogs potentially complicates the interpretation of mapping results. Specifically, we can be only reasonably confident that the specific gene used for primer design was mapped if the size of one of the polymorphic bands matches the product size predicted on the basis of positions of the primers on the gene from which the primers were designed. The polymorphic fragments amplified by ∼75% of the type II and III intron-spanning primers do not match the sizes expected on the basis of the genomic sequence used for primer design. Hence, the map positions for these PCR products most likely reflect the positions of paralogs of the genes used for primer design. Fortunately, this affects only a small percentage of the genes mapped in this study (supplemental Table 3 at http://www.genetics.org/supplemental/).
On the basis of an analysis of eight inbreds, it has been reported that, in maize, IDPs are frequently associated with 3′-UTRs. More than 80% of the IDPs between B73 and Mo17 were ≤3 bases (Bhattramakki et al. 2002). In this study we used PCR primers to identify and detect, in both 3′-UTRs and introns, larger IDPs that can be easily detected via agarose gel electrophoresis. Detectable indel polymorphisms were observed between B73 and Mo17 at a rate of 1/440 bp, which is somewhat lower than the 1/309 bp reported by Vroh Bi et al. (2006). Our calculation is, however, somewhat more restrictive in that it considers only indels detected using intron-spanning primers that yielded PCR bands that had the sizes expected on the basis of sequences of the genes used for primer design.
The 3′-UTRs and introns exhibit similar rates of polymorphisms (10.8% vs. 10.4%, Figure 1A), but introns yield a significantly higher (1.4-fold) rate of codominant (i.e., size) polymorphisms, which are typically more useful than dominant markers. Sequence analyses of a sample of the codominant size IDPs revealed that those ≥100 bp are often associated with MITEs (12/15), which is consistent with the observation that the MITE family Heartbreaker inserts preferentially into genic regions (Zhang et al. 2000). Hence, we suggest that, as an alternative to the dominant markers developed using MITE display (Casa et al. 2000, 2004), primers that flank MITEs in genomic sequences could be used to develop codominant PCR-based markers that would be detected via regular agarose gel.
Approximately 67% (860/1275) of the polymorphisms detected and mapped in this study were dominant presence/absence markers. One possible explanation for these presence/absence polymorphisms is that one inbred contains polymorphisms in at least one of the primer-binding sites that reduce or eliminate PCR amplification. Alternatively, one inbred could have large insertions or secondary structures between the two primer-binding sites that reduce or eliminate PCR amplification.
Because primer pairs were designed on the basis of genes or predicted genes, some of the presence/absence polymorphisms could reflect the absence of the corresponding genes in one inbred (Fu and Dooner 2002; Brunner et al. 2005; Morgante et al. 2005). Because this type of haplotype variability is usually associated with small gene families (Fu and Dooner 2002), this explanation works best for the 12% (101/860) of primers (types II and III) that detected a presence/absence polymorphism but also amplified at least one additional fragment from both B73 and Mo17 that was not polymorphic at the resolution afforded by gel electrophoresis primers. Consistent with this hypothesis is the observation that the map locations of IDP dominant markers that yield a type II or type III polymorphism exhibit noncolinearity with the rice genome at an approximately twofold higher frequency than do type I dominant IDP markers (41% vs. 20%, Table 7). Hence, at least a fraction of the type II or III polymorphisms may reflect Helitron-induced duplications (Lai et al. 2003, 2005; Morgante et al. 2005) or other gene duplication processes (Jiang et al. 2004). The fact that “B73-plus/Mo17-minus” markers outnumber “B73-minus/Mo17-plus” markers in a 5:1 ratio probably reflects the fact that more of the sequences used for primer design were obtained from B73 than from Mo17.
Applications of polymorphism data:
Although the IDP markers mapped in this study were selected because they exhibit polymorphisms between B73 and Mo17, many are also informative in other genotypes. In addition to B73 and Mo17 lines, 22 other inbred lines were genotyped using the mapped IDP primers and the resulting data (http://maize-mapping.plantgenomics.iastate.edu/actdata.html) can be used to plan genetic mapping experiments. For example, using these data, it is possible to preselect informative markers to genotype a population developed from any 2 of these 24 inbreds (or closely related inbreds). Even if only one of the parents in a mapping population was drawn from the 24 genotyped inbreds, it is possible to select IDP markers that are most likely to be informative by focusing genotyping efforts on IDPs for which that parent carries an allele that is “rare” within the set of 24 inbreds. Finally, if one wants to map a mutant in a defined genetic background (e.g., isolated via EMS mutagenesis of the B73 inbred), one can use these polymorphism data to select an appropriate parent to develop a mapping population.
We thank An-Ping Hsia for critical comments, Josh Shendelman for collecting mapping data, and Mu Zhang and Karthik Viswanathan for web page and database design and implementation. We are also grateful to two anonymous reviewers for valuable suggestions. Access to the Missouri MMP data is gratefully acknowledged. This project was supported by competitive grants from the National Science Foundation Plant Genome Program (DBI-9975868 and DBI-0321711) and by Hatch Act and State of Iowa funds.
- Received July 6, 2006.
- Accepted August 21, 2006.
- Copyright © 2006 by the Genetics Society of America