The Glu-1 locus, encoding the high-molecular-weight glutenin protein subunits, controls bread-making quality in hexaploid wheat (Triticum aestivum) and represents a recently evolved region unique to Triticeae genomes. To understand the molecular evolution of this locus region, three orthologous Glu-1 regions from the three subgenomes of a single hexaploid wheat species were sequenced, totaling 729 kb of sequence. Comparing each Glu-1 region with its corresponding homologous region from the D genome of diploid wheat, Aegilops tauschii, and the A and B genomes of tetraploid wheat, Triticum turgidum, revealed that, in addition to the conservation of microsynteny in the genic regions, sequences in the intergenic regions, composed of blocks of nested retroelements, are also generally conserved, although a few nonshared retroelements that differentiate the homologous Glu-1 regions were detected in each pair of the A and D genomes. Analysis of the indel frequency and the rate of nucleotide substitution, which represent the most frequent types of sequence changes in the Glu-1 regions, demonstrated that the two A genomes are significantly more divergent than the two B genomes, further supporting the hypothesis that hexaploid wheat may have more than one tetraploid ancestor.
POLYPLOIDIZATION, an evolutionary process resulting in more than one genome per cell, has played a significant role in the evolutionary history of plants, particularly in agriculturally important crops (Masterson 1994; Soltis and Soltis 1999; Wendel 2000). Bread wheat is an allohexaploid species (Triticum aestivum L. 2n = 6x = 42), consisting of three sets of highly related genomes (A, B, and D). Hexaploid wheat originated from two independent polyploidization events. The first event involved the hybridization of two diploid progenitors: an ancestor of Triticum urartu (2n = 2x = 14, genome AA) and an unconfirmed species (BB genome) related to Aegilops speltoides (2n = 2x = 14, genome SS), which resulted in cultivated allotetraploid emmer wheat (T. turgidum ssp. dicoccum, 2n = 4x = 28, genomes AABB) (Dvorak et al. 1992; Blake et al. 1999). In the second event, which occurred 8000–10,000 years ago, an ancestor of the diploid Aegilops tauschii (DD genome) hybridized with the allotetraploid to form a hexaploid wheat (2n = 6x = 42) (Feldman et al. 1995).
Because of its relatively recent speciation, wheat represents an excellent system for studying evolutionary events that occur in genomes shortly after polyploidization. Different types of genetic and epigenetic changes, including DNA removal, changes in gene expression, reactivation of transposable elements, and functional diversification of duplicate genes, are all known to be significant to the evolutionary process in polyploid species (Ozkan et al. 2001; Shake et al. 2001; Comai et al. 2002; He et al. 2003; Kashkush et al. 2003; Blanc and Wolfe 2004). However, the particular molecular mechanisms underlying the rapid genome evolution observed in polyploid genomes are not well understood. Sequence comparisons of polyploid genomes with their ancestral genomes may represent the strategy of choice to identify sequence changes caused by recent genome evolution. Recently, this strategy has been successfully used to elucidate the molecular basis of the polyploidy-related deletion of the Hardness locus from the tetraploid wheat (T. turgidum) (Chantret et al. 2005)
One of the great challenges for wheat genome research is the size of Triticeae genomes. Compared with model plant species, such as Arabidopsis (130 Mb) and rice (430 Mb), the bread wheat genome (∼16,000 Mb) is extremely large. We now know that although the sudden increase in chromosome numbers caused by polyploidization contributed to the overall increase in the wheat genome size, the replication and insertion of repetitive DNA, particularly long terminal repeat (LTR) retrotransposons (SanMiguel et al. 1996), is another major cause for genome expansion (Bennetzen et al. 2005). In wheat, repetitive DNA accounts for ∼90% of the genome, of which retrotransposons constitute 60–80% (SanMiguel et al. 2002; Wicker et al. 2003; Gu et al. 2004).
In addition to being responsible for genome size variation, transposable elements often stimulate other types of genomic rearrangements, including unequal homologous and illegitimate recombination to remove nuclear DNA (Devos et al. 2002; Ma et al. 2004; Chantret et al. 2005). Their transpositions can cause gene inactivation or changing expression of adjacent genes (Kashkush et al. 2003; Gu et al. 2004). Repetitive DNA elements also make useful markers for revealing recent genomic changes because they are abundant and believed to be neutral under natural selection. Repetitive DNA, including retrotransposons, has a relative short turnover period (Ma et al. 2004) so that colinear retrotransposons are usually not found in distantly related genomes. Our previous studies of comparative analyses of orthologous high-molecular-weight (HMW) glutenin regions indicated that only a few remnants of elements are colinear in the closely related wheat A, B, and D genomes that diverged from each other within the last 4–5 million years (MY) (Huang et al. 2002; Gu et al. 2004; Kong et al. 2004). Because of this, the intergenic regions are largely not conserved between the A, B, and D genomes of wheat, although gene colinearity is retained (Gu et al. 2004). Furthermore, several recent studies suggested that retrotransposable elements may be one of the major causes for intraspecific sequence variations in the intergenic regions (Brunner et al. 2005; Scherrer et al. 2005). A comparison of homologous regions from different inbred lines in maize indicated that >59% of the compared sequence is noncolinear largely due to the insertion of novel or allele-specific LTR retrotransposons (Brunner et al. 2005).
In addition to retroelements, other genomic rearrangements, particularly sequence insertions and deletions (indels), play significant roles in genome evolution (Petrov et al. 2000; Gregory 2004). In rice, an estimated minimum of 119 Mb of sequence has been removed from the genome in the last 5 MY (Ma and Bennetzen 2004). A comparison of the Hardness locus in diploid and polyploid wheat species has indicated that multiple genomic deletions occurred independently in different genomes and that indels are one of the primary evolutionary mechanisms involved in reconstructing this domestication gene region (Chantret et al. 2005). Gene disruption caused by deletions and large inversions has resulted in various haplotypes in the wheat A genome (Isidore et al. 2005). Despite rapid DNA rearrangements and intraspecific violation of genetic colinearity observed in recent studies (Brunner et al. 2005; Chantret et al. 2005; Isidore et al. 2005; Scherrer et al. 2005), it remains to be shown whether such rapid sequence diversification has occurred throughout the whole genome or only in certain genetic loci. Future sequence comparisons of regions from homologous genomes, such as wheat genomes at different ploidy levels, will help answer this question.
In wheat, the Glu-1 locus encodes HMW glutenin protein subunits, which are the major determinants of bread-making quality for wheat flour, making Glu-1 one of the most important genetic loci in wheat and frequently the target of genetic engineering efforts for the improvement of grain quality (Blechl and Anderson 1996; Rooke et al. 1999). The wheat HMW glutenin locus and the orthologous loci of barley and rye are unique to Triticeae species (Shewry and Tatham 1990), which suggests that these loci evolved relatively recently within the Triticeae tribe. Studies focusing on these specific orthologous HMW glutenin regions from diploid (Anderson et al. 2003) and tetraploid wheat genomes (Kong et al. 2004), as well as from barley (Gu et al. 2003), have provided the first view of genome evolution in three homeologous regions from the wheat A, B, and D genomes (Gu et al. 2004). In this article, we report sequencing the three HMW glutenin locus regions located on the long arms of homeologous group 1 chromosomes from a single hexaploid wheat species, which has allowed us to compare the homologous Glu-1 regions from the hexaploid wheat with its diploid and tetraploid ancestors. The sequence conservation and divergence detected could shed light on the evolutionary history of the wheat genomes.
MATERIALS AND METHODS
Isolation and sequencing of hexaploid wheat BACs:
Hexaploid wheat BAC clones were obtained from the T. aestivum cv. Renan BAC library by screening with PCR primers specific to HMW glutenin genes. Assignment of BAC clones to the A, B, and D genomes was based on their characterization by restriction fragment length polymorphisms and contig assembly by BAC fingerprinting on agarose gels as described previously (Kong et al. 2004; Chantret et al. 2005). Selection of BACs for sequencing was based on contig maps constructed for each orthologous Glu-1 locus and Southern hybridization data using various probes flanking the x-type and y-type HMW glutenin genes. BAC clones that covered the largest regions of each Glu-l locus were selected. The shotgun-sequencing libraries for hexaploid wheat BAC clones were constructed by the method described by either Gu et al. (2003) or Chantret et al. (2005). Plasmid DNAs from single colonies were purified and inserts were sequenced from both directions with T7 and T3 primers using BigDye terminator chemistry (Applied Biosystems, Foster City, CA) on ABI3730 capillary sequencers. Gaps between sequence contigs were filled and sequenced by primer-walking and transposition reactions (Finnzymes, Espoo, Finland). Gaps caused by GC-rich regions were usually filled by resequencing using the dGTP BigDye terminator chemistry (Applied Biosystems).
For sequence assembly, a target of 10-fold coverage was chosen. Base calling and quality of the shotgun sequences were processed using Phred (Ewing and Green 1998). The sequence data generated for each BAC clone was used to assemble continuous contigs using both the Lasergene SeqMan module (DNAStar) (http://www.DNAStar.com) and Phrap assembly engine (http://www.phrap.org). In some cases, assemblies with two different programs helped resolve gap regions and the order of contigs. The consensus sequences were obtained by analyzing at least three sequence reads (on both strands) or using sequencing methods based on two different labeling procedures applied on one strand. To validate the accuracy of the sequence assembly, digestion patterns of BAC DNAs with HindIII, EcoRI, and NotI were compared with the predicted restriction patterns of the computer-assembled sequences.
For annotation, the assembled sequences of the A, B, and D genome BACs from the hexaploid wheat were compared with the previous annotations for the Glu-1 regions from the barley BAC (AY268139), diploid D-genome BAC (AF497474), tetraploid A-genome BAC (AY494981), and B genome BAC (AY368673). In addition, a homology search was performed against NCBI nonredundant and dbEST databases using BLASTN, BLASTX, and TBLASTX algorithms. FGENESH (http://www.softberry.com/nucleo.html) and GENESCAN (http://genemark.mit.edu/GENESCAN.htm) were used for gene prediction. DNA repetitive elements were identified with NCBI BLAST searches, with DNAStar MegAlign dot-plot analysis, and by comparison with the Triticeae Repeat Sequence Database at the GrainGenes website at http://wheat.pw.usda.gov/ITMI/Repeats/. The definition and naming of new retrotransposons were done according to the method described by SanMiguel et al. (1998).
The rate of nonsynonymous (Ka) vs. synonymous (Ks) substitutions were calculated for four genes (x-type, y-type HMW glutenin genes, globulin, and protein kinase genes) with MEGA3 (Kumar et al. 2004). Dating of retrotransposon insertions was performed on the basis of the method described in SanMiguel et al. (1998). The number of transition and transversion mutations was calculated using the MEGA3 software (Kumar et al. 2004). The average substitution rates of colinear retrotransposons in the two homologous genomes were also calculated in the same way and used to estimate the divergence times on the basis of the method described by Wicker et al. (2003).
The statistical significance regarding differences in rates of nucleotide substitution and indel frequency among each pair of the homologous A, B, and D genomes was measured by randomly sampling ∼10-kb intervals in the Glu-1 regions. The Student's t-test was employed for statistical analyses.
Isolation and sequencing of Glu-1 regions from hexaploid wheat:
Previously, we have reported sequences of the orthologous HMW glutenin regions from the D genome of the diploid A. tauschii and the A and B genomes of the tetraploid T. turgidum, representing the ancestral genomes of hexaploid wheat. To further examine the evolution of these genomic regions in hexaploid wheat, we screened a large insert BAC library constructed from the hexaploid wheat species cv. Renan with PCR primers specific to the different copies of HMW glutenin genes. The screening of this BAC library identified 12 HMW glutenin BAC clones from the A genome, 10 from the B genome, and 18 from the D genome. Further characterization of these BAC clones indicated that none of the BAC clones belonging to the A and B genomes contains both the y-type and x-type HMW glutenin genes (data not shown). BAC DNA was fingerprinted using the HindIII restriction enzyme and, using FPC software, contigs were built that span the Glu-1 loci from each wheat genome (Soderlund et al. 2000). The assembled contig maps and the BAC Southern data were used to guide the selection of BAC clones for sequencing. BAC clone 706G08 represents the Glu-1 region from the D genome. The Glu-1 region from the B genome was covered by the sequences from two BAC clones, 1289J04 and 2001P20. Three BACs were sequenced for the A-genome HMW glutenin region (BAC clones 1344C16, 754K10, and 1031P08). A 152,010-bp region derived from the D-genome BAC, a 285,506-bp region derived from the combined sequences of two B-genome BAC clones, and a 292,044-bp region from three A-genome BAC clones were annotated using a combination of bioinformatics tools and BLAST searches against publically available databases.
Gene colinearity between HMW glutenin genomic sequences across the three wheat ploidy levels:
Sequence comparison of orthologous Glu-1 regions between three homeologous wheat genomes revealed microcolinearity in the genic regions, but large-scale sequence divergence in the intergenic regions (Gu et al. 2004). Sequencing the corresponding Glu-1 regions from the three subgenomes of hexaploid wheat (A, B, and D) enables us to analyze sequence changes compared to homologous genomes from the diploid and tetraploid wheats. In all the sequenced Glu-1 regions, there are a total of six genes (Gu et al. 2004). The order of these genes is as follows: a leucine-rich-repeat receptor-like protein kinase, a globulin, a y-type HMW glutenin, a duplicate globulin, an x-type HMW glutenin, and a serine/threonine protein kinase. In the three genomes of hexaploid wheat, these six genes exist in the same order and orientation as those in the ancestor's genomes (Figure 1). Previously, a tandem duplication of an ancestral region containing the globulin and HMW glutenin genes was characterized at the Glu-1 locus regions (Kong et al. 2004). This duplication, which led to the presence of the x-type and y-type copies of HMW glutenin genes in the wheat genome, occurred after the divergence of wheat and barley (Gu et al. 2003; Figure 1). Extension of sequences in the 5′ HMW glutenin regions identified a second tandem duplication, two copies of the receptor kinase genes present near the Glu-1 locus. This duplication exists in at least the A and D genomes of the hexaploid wheat, suggesting the occurrence of the event prior to the divergence of the A and D genomes. The paralogous receptor kinase 1a and 1b in the hexaploid D genome share 90% sequence identity, whereas the orthologous receptor kinase 1a from the hexaploid D genome and the hexaploid A genome share 93% sequence identity. The receptor kinase 1b in both the hexaploid A genome and the tetraploid A genome is disrupted by the same transposable elements with identical insertion patterns (data not shown). Whether or not the duplication of the receptor kinase gene is present in the B genome is not determinable from the current data; the 5′-end of the B-genome BAC is downstream of where the receptor kinase 1a would be putatively located (Figure 1).
In an earlier study, we reported that although gene colinearity was not violated between homeologous wheat genomes, several genes were subjected to differential disruptions by various mechanisms, including repetitive DNA insertion, sequence deletion, and nucleotide substitutions causing an in-frame stop codon (Gu et al. 2004). Five genes found to be disrupted at the Glu-1 locus regions in the hexaploid wheat appear to have the same patterns of gene disruption as those identified in their ancestor genomes (Table 1). For example, the second globulin gene in the hexaploid B genome (2b) has the same four retrotransposon insertion events as those identified in the tetraploid B genome (Kong et al. 2004). Previously, we reported that the second globulin genes in the diploid D and tetraploid A genomes were disrupted by an identical deletion event, suggesting that this gene disruption occurred before the divergence of the wheat A and D genomes (Anderson et al. 2003; Gu et al. 2004). In the hexaploid wheat, the second globulin gene (2b) in both the A and D genomes share this same deletion. The receptor kinase 1b in the hexaploid A genome contains the same multiple insertions of retroelements and miniature inverted repeat sequences (MITES)as the receptor kinase gene in the durum A genome (Gu et al. 2004). On the basis of this evidence, it appears that the disruption of these genes in the hexaploid wheat had already occurred in its diploid or tetraploid progenitors.
Genes disrupted in only one of the homologous genomes were also identified. The Ax HMW glutenin gene falls within this category. A point mutation in the tetraploid Ax HMW glutenin gene has resulted in a premature stop codon, while the Ax HMW glutenin gene in the hexaploid wheat is intact.
Sequence comparison of the Glu-1 regions across the three wheat ploidy levels:
To further analyze the sequence variation present in homologous wheat genomes, we performed dot matrix analyses between pairs of corresponding genomes from different ploidy wheats. Sequence divergences are seen as disruptions in the main matrix diagonal line. Figure 2 suggests that the sequences between the homologous Glu-1 regions are generally conserved. The gaps along the diagonal lines represent types of sequence rearrangements, such as deletions/insertions, duplications, and inversions, that differentiate the two homologous genomes. LTR retrotransposable elements are usually 6–8 kb and insertions of such elements that occur in only one of two compared regions will result in large gaps in the dot matrix analysis. The HMW glutenin region of the A genome of T. turgidum contains three nonshared retroelement insertions when compared with the corresponding region from the A genome of T. aestivum. In the region between the receptor kinase 1b and globulin 2a, there is a WIS-type element (Wis-3) insertion present in the coding region of a gypsy-class element Boba-1 (Gap1). Gap2 was caused by the insertion of Wis-4 in the coding region of the Ay HMW glutenin gene in tetraploid wheat. In the region between the Ax HMW glutenin and the serine/threonine protein kinase genes, there is a gypsy-class element, Erika-2, that is not present in the corresponding area of our T. aestivum sequence (Gap4). These results provide evidence that these elements have inserted since the two genomes last shared a common ancestor. In addition to these retroelement insertions, there are also two significant indel events in the T. aestivum sequence. The first of these removed the entire coding region and a fragment of each LTR from the element Madil-1, located in the intergenic region between the receptor kinase and globulin genes, leaving behind a sequence similar to a solo LTR in size and structure (Gap3). The second indel, which occurred in a relatively old CACTA element, Jorge-1, removed a portion of the coding sequence and possibly a MITE, Eos-1, as well.
By contrast, the HMW glutenin regions of the two B genomes revealed only a few major sequence differences. A 6-kb indel (Gap5) is present in the intergenic region between the receptor protein kinase 1b and globulin 2a coding regions. While the sequence of this 6-kb indel showed no sequence similarity to known transposable element sequences or structures, it is flanked by an 8-bp (TAAGAATT) perfect repeat, suggesting that illegitimate recombination resulted in this indel (Wicker et al. 2003; Chantret et al. 2005). In addition to the 6-kb indel, there is a smaller indel, ∼1.7 kb (Gap6), present within the class I transposable element Sogi-1. Sequence comparisons revealed no novel retroelement insertions that could serve to differentiate the two B genomes.
The HMW glutenin regions of the two D genomes are the most divergent at the large-scale sequence rearrangement and insertion/deletion level. Since the time when the D-genome donors of sequenced accessions of Ae. tauschii and T. aestivum shared a common ancestor, three novel retroelement insertions have occurred in this area. In Ae. tauschii, two copia class retroelements, Angela-5 (Gap9) and Wis-1s (Gap10), have inserted into a large block of retroelements in the region downstream of the serine/threonine protein kinase. In T. aestivum, a complete copy of the element Wis-1 (Gap8) is present in the nested retroelement structure between the two HMW glutenin genes (Figure 3). In addition to these differential insertions, there is also a deletion of the coding region and portions of both LTRs from Angela-4 (Gap7), resulting in a partial LTR retroelement, Angla-4p in Ae. tauschii (Figure 3). A series of sequence rearrangements/insertions in the 5′ LTR sequence of Angela-7p in T. aestivum generated Gap11 (Figure 2).
Shared and nonshared transposable elements between Glu-1 homologous regions:
Despite indel events shown in Figure 2, the dot matrix analyses revealed general sequence conservation of the Glu-1 regions between two homologous genomes, suggesting that many transposable elements in the intergenic regions are shared or colinear. To test this, we calculated the number of shared retroelements vs. nonshared retroelements in the Glu-1 regions from pairs of homologous genomes. On the basis of the activity of retroelement insertions, the two B genomes are the most conserved, with all 20 of the retroelements identified present in both genomes (Figure 4A). In the ∼300-kb Glu-1 regions from the two A genomes, we identified 35 retroelements that are present in both genomes. The two nonshared retroelements are both present in the A genome of durum wheat. In the ∼100-kb homologous Glu-1 regions of the two D genomes, 14 retroelements are found to be shared, while 3 are nonshared. Among these 3 new retroelements, there is 1 intact retroelement, one solo LTR, and 1 partial retroelement missing one LTR and a portion of the coding region. The solo LTR and the partial LTR retroelement that differentiate the two D genomes are somewhat surprising since newly inserted retroelements would be expected to be intact (Ma et al. 2004; Brunner et al. 2005). However, this may suggest that insertions of new transposable elements can be followed by DNA recombination.
Although nonshared retroelements were observed in the homologous Glu-1 regions between two A and D genomes, the frequency of nonshared retroelements is much lower than those found in homologous regions from different maize inbred lines, the Rph7 locus regions from two different barley accessions, the Lr10 locus from homologous A genomes in different ploidy levels, and the Ha locus from homologous A, B, and D genomes in different ploidy levels (Brunner et al. 2005; Chantret et al. 2005; Isidore et al. 2005; Scherrer et al. 2005). In this study, only 5.7% of the retroelements are nonshared in the Glu-1 regions from the two A genomes, 0% from the two B genomes, and 21% from the two D genomes. The nonshared retroelements account for 18.1% of the sequence divergence between the two D genomes and 8.2% between the two A genomes.
Sequence insertions and deletions in the homologous Glu-1 regions:
Indels play a fundamental role in genome evolution through the loss or gain of DNA in specific regions (Gregory 2004; Taylor et al. 2004). There is still little known about the distribution and frequency of indels in plant genomes. In addition to the large indels that cause visible gaps in the dot matrix analysis (Figure 2), we also analyzed smaller indels present in the Glu-1 regions of each pair of the homologous wheat genomes (Figure 4B). Clearly, smaller indels (≤10 bp) greatly outnumber large indels in terms of frequency in all three genomes. They account for 88.2% of the indel frequency for the A genome, 83.3% for the B genome, and 85.0% for the D genome.
When the indel frequencies were compared in the A, B, and D genomes, the frequency of indels differed significantly between any two genomes in the Glu-1 regions (P < 0.05, Student's t-test). The D genome had the highest frequency of indels (16.9 indels/10 kb). The indel frequency for the A and B genomes was 8.0 and 6.2, respectively. Indels are a type of sequence rearrangement event and the frequency reflects the evolutionary distance of two related genomes in comparison (Ogurtsov et al. 2004). Therefore, our data suggest that the two D genomes are more distantly related as compared to the two A and B genomes.
Indels directly contribute to the sequence divergence between two compared sequences. We calculated the total length of sequences represented by the indels. The total length of the indel sequence is 10,738 bp for the A genome, 11,382 bp for the B genome, and 13,612 bp for the D genome. They account for 4.0, 6.2, and 13.3% of sequence divergence in the compared regions for the A, B, and D genomes, respectively. When the length of indels represented by different indel sizes was calculated, indels >100 bp contributed much more to the total indel length (Figure 4C). They account for 93.5, 96.5, and 95.8% of the total indel length for the A, B, and D genomes, respectively. These results suggest that large indels are primarily responsible for the indel-based sequence divergence between the two homologous wheat genomes.
Nucleotide substitutions in the Glu-1 region of the homologous wheat genomes:
Another type of common sequence variation, occurring in even the most conserved regions, is the single nucleotide polymorphism, caused by nucleotide substitutions (Brumfield et al. 2003). The high level of sequence conservation observed in the Glu-1 regions of the homologous genomes allows us to examine nucleotide substitutions in large contiguous segments spanning both the genic and intergenic regions. We calculated that the average number of nucleotide substitutions per site is 0.0093, 0.0056, and 0.0137 for the A, B, and D genomes, respectively. The variations in average nucleotide substitutions are all statistically significant (P < 0.001). The data indicate that the nucleotide substitution rates for the A and B genomes are different despite the fact that the two genomes have been co-evolving in the nuclei of the same species.
The rate of nucleotide substitutions in intergenic regions is most often considerably higher than those found in genic regions and protein-coding sequences (Ma and Bennetzen 2004). We also compared the number of nucleotide substitutions per site between the HMW glutenin genes with those of the intact globulin and protein kinase genes that flank the Glu-1 locus in all three wheat genomes. The number of synonymous substitutions (Ks) and nonsynonymous substitutions (Ka) per site between genes from the homologous genomes is given in Table 2. Considerable variations in nucleotide substitutions are observed among these genes, suggesting that different genes evolve at different rates. The globulin 2a in the two A genomes and protein kinase genes in both the A and B genomes appear to be more conserved. The number of nucleotide substitutions per site in the HMW glutenin regions is at least similar to or greater than that found in the neighboring intergenic regions. The high nucleotide substitution rate coincides with the previous observation that HMW glutenin genes are highly polymorphic in wild wheat and display significant variation in modern cultivars (Allaby et al. 1999; Shewry et al. 2003).
Time of divergence of the homologous genomes:
In the Glu-1 regions, the identification of a number of LTR retrotransposons that are intact and shared by two homologous genomes allows us to further examine the sequence changes and divergence times of the homologous genomes in wheat. Because of the particular mechanism of reverse transposition, the two LTR sequences from a single retrotransposon are identical at the time of its insertion (Boeke and Corces 1989). On the basis of this property, the nucleotide substitutions identified in two LTRs of a retrotransposon can be used to estimate the date of its insertion (SanMiguel et al. 1998). Previously, we estimated insertion times for 22 LTR retrotransposons in the wheat Glu-1 region using a mutation rate of 6.5 × 10−9 substitutions/synonymous site/year (Guat et al. 1996; SanMiguel et al. 1998). It appears that these datable LTR retrotransposons have all inserted into their current positions within the last 4–5 MY (Gu et al. 2004). A similar methodology can be employed to estimate divergence times for two homologous sequences on the basis of the number of nucleotide substitutions accumulated after the ancestor genome started to split into two distinct descendants. However, considerable variation in estimated divergence time has been noted when single gene sequences are used (Huang et al. 2002; Devos et al. 2005). One possible explanation is that rates of nucleotide substitution are different for genes that are under different selective forces. A better representation of divergence time for two closely related sequences may be obtained by comparing multiple regions, ideally intergenic sequences that have few selective forces acting on them (Wakeley and Hey 1997). In this study, we used the same colinear LTR retrotransposons previously identified and dated in the Glu-1 regions to estimate the divergence time of the two homologous genomes. A total of 8–9 intact colinear LTR retrotransposons for each B and each A genome pair, representing ∼40–50 kb of sequence, were used to estimate the divergence times. For the two homologous D genomes, only two full-length, datable LTR retrotransposons are colinear; we identified additional partial yet colinear retroelements to obtain ∼40 kb of total sequence for comparison. Using the same molecular clock for dating the LTR retrotransposon insertions, we estimated the divergence time by calculating rates of nucleotide substitution between each pair of colinear retroelements to examine the variation in different sequences (Table 3). The divergence times estimated, using common retroelements, for the two homologous A genomes range from 0.38 to 1.2 MY, with an average of 0.81 MY. The two B genomes are estimated to have diverged in the last 0.24–0.78 MY, with an average of 0.48 million years ago (MYA). The two D genomes are the most divergent, ranging from 1.12 to 1.86 MYA with an average of 1.39 MYA. The longer divergence time between the two D genomes shows that the diploid D-genome sequence is divergent from the D-genome donor(s) of the hexaploid wheat by ∼1.39 MY (Table 3). It has been noted that LTR retroelements evolve at least two times faster than genes and UTR regions (SanMiguel et al. 1998; Wicker et al. 2003; Ma and Bennetzen 2004). If this is the case, the divergence times estimated for each pair of the homologous genomes should be divided by a factor of 2.
In this study, a detailed sequence comparison was performed to study the organization and evolution of the hexaploid wheat genome with its ancestral diploid and tetraploid wheat genomes. On the basis of the sequence analysis of Glu-1 regions, our results reveal that although sequence rearrangements differentiating each pair of two homologous genomes are easily visible, a considerable portion of the sequences are highly conserved between the two homologous genomes, including large segments of intergenic regions composed of nested retroelements. Although retroelements provide the majority of sequence divergence between two homologous genomes, indels and nucleotide substitutions are the most frequent events in the compared Glu-1 regions.
Sequence conservation in the homologous Glu-1 region:
Our previous study indicated that microcolinearity is maintained in the orthologous Glu-1 regions from homeologous wheat genomes, but intergenic regions were not conserved due to rapid amplification/deletion of retroelements (Gu et al. 2004). This study revealed general sequence conservation between two homologous genomes from the different wheat ploidy levels. In addition to the conservation of the genic regions, sequences in the intergenic regions are also highly colinear (Figure 4A). This suggests that a vast majority of retroelements in the hexaploid wheat genome were inherited from their diploid and tetraploid ancestral genomes. In the Glu-1 region of the tetraploid A genome, an intergenic region between the y-type and x-type HMW glutenin genes contains large blocks of nested retrotransposon insertions, with as many as 19 members spanning a region of 140 kb (Gu et al. 2004). The same nested retroelement structure is also present in the hexaploid A genome, suggesting that the intergenic region has not been drastically changed due to major sequence rearrangements. The presence of a high number of colinear retroelements in the Glu-1 regions from different ploidy wheat genomes is surprising, considering the results from similar comparative sequence studies on other homologous locus regions. In the genomes of different maize inbred lines, it was found that 70% of LTR retrotransposons are allele specific (Brunner et al. 2005). When homologous regions containing the barley Rph7 locus from resistance and susceptible lines were compared, the number and type of repetitive elements were completely different in 65% of the sequenced regions (Scherrer et al. 2005). In the wheat Lr10 region, transposon insertions resulted in >70% sequence divergence among three A genomes from diploid, tetraploid, and hexaploid wheats (Isidore et al. 2005). In the Glu-1 regions, retroelements resulted in only 8.2% of nonshared sequence between the two A genomes, 18.1% between the two D genomes, and 0% between the two B genomes.
In the Glu-1 regions, violations of gene colinearity were not detected between two homologous genomes. This is also significantly different from the other genetic loci studied in maize (Brunner et al. 2005), barley (Scherrer et al. 2005), and wheat (Wicker et al. 2003; Chantret et al. 2005; Isidore et al. 2005). One possible explanation for the sequence conservation observed in the Glu-1 region could be that the hexaploid line (Renan) selected in this study has the same or a very similar haplotype as the tetraploid wheat cv. Langdon. However, our haplotype analyses suggest that the two homologous A and B genomes belong to different haplotypes (see supplemental data at http://www.genetics.org/supplemental). Therefore, our results suggest that the wheat Glu-1 locus is a conserved genetic region, likely with a low recombination rate since recombination rates are positively correlated with the frequency of locus duplication and deletions that often cause synteny perturbation between wheat homeologous chromosomes (Akhunov et al. 2003). Furthermore, the conservation of the Glu-1 locus regions is further supported by the lack of recombination events observed between x- and y-type HMW glutenin genes, despite the wide interest in breaking the linkage for wheat quality improvement in breeding (Shewry et al. 2003). In addition, our data indicate that at the HMW glutenin loci there is no evidence for polyploidization-induced genomic changes that resulted in significant sequence rearrangements.
Sequence rearrangements by indels in the wheat Glu-1 regions:
Despite the general conservation in the Glu-1 regions, indels, particularly small indels, are frequently observed between the homologous genomes. In the Glu-1 regions, small indels (<10 bp, ∼85%) greatly outnumber larger indels (>10 bp, ∼15%) in all three pairs of homologous genomes. Small indels often occur in coding regions, making them a major player in gene evolution (Taylor et al. 2004). In addition, small indel events might be associated with nucleotide substitutions, resulting in an increased rate of single nucleotide polymorphisms (Ma and Bennetzen 2004). Indel occurrences have been used to estimate evolutionary distance between related genomes (Ogurtsov et al. 2004). In this study, we also detect a positive correlation between the frequency of indels and the rate of nucleotide substitution. The two homologous D genomes that have the longest divergence times show the highest indel frequency and greatest rate of nucleotide substitution.
In the homologous Glu-1 regions, although small indels are more ubiquitous, they contribute less to the genome size difference. However, large indels contribute more heavily to overall sequence divergence in closely related genomes (Figure 4C). In addition, it is often difficult to distinguish if an indel was caused by an insertion or a deletion. Deletions generally outnumber insertions (Blumenstiel et al. 2002; Gregory 2004; Ma and Bennetzen 2004). In the Glu-1 regions, most large indels were identified to be deletions since they occur in the sequences of well-characterized retroelements. It is likely that nonintact or repetitive DNA fragments observed in the intergenic regions are caused mainly by deletion events.
In general, small indels are caused by replication slippages (Gregory 2004), whereas large indels seems to involve different mechanisms. One mechanism is the unequal homologous recombination that often acts on LTR transposable elements, resulting in solo LTRs (Devos et al. 2002; Ma et al. 2004). In contrast, indels caused by illegitimate recombination appear to occur in any region of nucleotide sequence with as little as a few base pairs of sequence identity (Wicker et al. 2003; Chantret et al. 2005). It has been reported that the deletion events attributable to illegitimate recombination are much more frequent than deletion events caused by unequal homologous recombination (Wicker et al. 2003). Illegitimate recombination is primarily responsible for the removal of nonessential DNA in Arabidopsis (Devos et al. 2002). In wheat, indels caused by illegitimate DNA recombination are one of the major molecular mechanisms responsible for reshaping the Ha locus during wheat genome evolution (Chantret et al. 2005). Our results indicate that indels are dynamic evolutionary processes that contribute to sequence divergence between homologous wheat genomes.
Wheat genome evolution:
Hexaploid wheat contains three homeologous genomes. The origin and evolution of these evolutionarily closely related genomes has been analyzed phylogenetically using sequences from HMW glutenin genes (Allaby et al. 1999; Blatter et al. 2004). The results indicated that wheat homeologous genomes diverged ∼5.0–6.9 MYA (Allaby et al. 1999).
In this study, we compared sequence changes among the homologous wheat genomes. Analysis of the Glu-1 regions from tetraploid and hexaploid wheat revealed that the A genome exhibits higher sequence variation than the B genome. This is supported by at least four lines of evidence. First, the rate of nucleotide substitution in the A genome is significantly higher than that detected in the B genome (P < 0.01). Second, the B genomes are devoid of nonshared retroelements, whereas two nonshared retrotransposons were identified in the A genome (Figure 4A). Third, the indel frequency in the A genome is also significantly higher than that found in the B genome (P < 0.01) (Figure 4). Fourth, the divergence times estimated for A and B genomes are significantly different (Table 3). The greater sequence conservation was also reported for the same two B genomes at the Ha locus on chromosome 5, where there are no nonshared retroelements and the two sequences have 99% identity (Chantret et al. 2005), suggesting that such sequence conservation is present at multiple genetic regions and is unlikely caused by introgression recombination. Although the possibility of introgression recombination cannot be completely excluded, the variation of sequence divergence between the A and B homologous pairs could be explained by a hypothesis that the A and B genomes in the tetraploid and hexaploid wheats evolve at different rates. This explanation would have to be based on the assumption that significant sequence changes have occurred since the tetraploidization event, which was estimated to have occurred in the last 0.36–0.5 MY (Huang et al. 2002; Dvorak and Akhunov 2005). Different rates of genome evolution have been noted for japonica and indica rice, which have experienced independent variation for ∼0.44 MY (Ma and Bennetzen 2004). However, in this study, both the A and B genomes are the subgenomes in a single polyploid wheat. As far as we know, no reports have shown that different subgenomes in a single species are subject to different evolutionary rates.
Another likely explanation for the observed difference in sequence variation is that hexaploid wheats have more than one tetraploid ancestor, resulting from independent tetraploidization events, in which two divergent A genomes hybridized with two less divergent B genomes, resulting in the different rates of sequence variation observed in our study. Our estimates of the divergence times for the two A genomes and the two B genomes, calculated by comparing colinear retroelements, is in accordance with this hypothesis. Our results indicated that the two B genomes have diverged for ∼0.48 MY, while the two A genomes have been separated for the last ∼0.81 MY. Another piece of supporting evidence includes the finding from another study that the A genome from the hexaploid wheat Renan has a different haplotype than the A genome from durum wheat at the Lr10 locus and that this haplotype originated from ancient DNA rearrangements at the diploid level (Isidore et al. 2005). Furthermore, we previously reported that the sequence in the Glu-1 locus from the hexaploid wheat cv. Chinese Spring is more closely related to that of the A genome from durum wheat, primarily because of shared mechanisms of gene disruption in the two orthologous Ay HMW glutenin genes and because of higher sequence identity compared to allelic HMW glutenin genes in other bread wheats, such as Cheyenne (Gu et al. 2004). It is likely that the Chinese Spring A genome has the same lineage as the A genome from the durum wheat and that the A genome in Renan and Cheyenne represents a different A genome lineage. Wheat exists at three ploidy levels, and multiple origins of hexaploid wheat have been suggested (Dvorak et al. 1998; Talbert et al. 1998; Blatter et al. 2004). Such multiple independent polyploidizations would serve to greatly increase the genetic diversity in the wheat gene pool. Our results provide further evidence based on the analysis of sequences of large wheat genomic regions spanning the Glu-1 locus across three ploidy levels. This study allows us to examine the various events of sequence changes that have occurred in the evolutionary history of wheat genomes. A genomewide analysis using various tools such as haplotype genotyping in different ploidy wheats will promote us to elucidate the occurrence of wheat speciation and the molecular evolution of the wheat genome.
We thank Mingcheng Luo for providing wheat materials for the haplotype analysis, Frank You for assistance in statistical and bioinformatics analysis, and Roger Thilmony for critical reading of the manuscript. We sincerely thank the Genoplante consortium (http://www.genoplate.com) for making available the BAC library from the T. aestivum cultivar Renan and the Glu-1 BAC clones. Sequencing of the BAC clones 1289J4 and 1001P20 covering the Glu-1 regions from the B genome was supported by the Genoplante consortium. Sequencing of the other BAC clones was supported by U.S. Department of Agriculture-Agriculture Research Service grant CRIS 5325022100-011. This work is also supported in part by National Science Foundation Plant Genome grant no. DBI-0321757.
- Received May 15, 2006.
- Accepted August 29, 2006.
- Copyright © 2006 by the Genetics Society of America