| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Genetics, Vol. 170, 1209-1220, July 2005, Copyright © 2005
doi:10.1534/genetics.105.040915
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




,1
* Department of Genetics, University of Georgia, Athens, Georgia 30602
Department of Biological Sciences, Purdue University, West Lafayette, Indiana 47907
Genomics Core Facility, Purdue University, West Lafayette, Indiana 47907
Waksman Institute, Rutgers, The State University of New Jersey, Piscataway, New Jersey 08854
1 Corresponding author: Department of Genetics, University of Georgia, Athens, GA 30602.
E-mail: maize{at}uga.edu
| ABSTRACT |
|---|
|
|
|---|
With the near completion of the rice genome sequence (FENG et al. 2002; SASAKI et al. 2002; RICE CHROMOSOME 10 SEQUENCING CONSORTIUM 2003), cross-species sequence comparisons in the grasses become increasingly feasible. So far, several orthologous grass genome segments containing more than one gene have been compared at the level of DNA sequence, including the sh2/a1-homologous regions of maize, sorghum, and rice (CHEN et al. 1997); the adh1-homologous regions of maize, sorghum, and rice (TIKHONOV et al. 1999; ILIC et al. 2003); the LrK-homologous regions of barley, maize, rice, and wheat (FEUILLET and KELLER 1999); the genomic regions near Vrn1 and its orthologs in wheat, barley, sorghum, and rice (RAMAKRISHNA et al. 2002a); the Zein gene cluster of maize and its orthologs in sorghum and rice (SONG et al. 2002); the Rp1-homologous regions of maize and sorghum (RAMAKRISHNA et al. 2002b); the Rph7-homologous regions of barley and rice (BRUNNER et al. 2003); and the lg2/lrs1-homeologous regions of maize and its ortholog in rice (LANGHAM et al. 2004). These studies uncovered little or no retention of sequence homology in intergenic spaces but indicate general conservation of gene content and gene order between orthologous genomic segments of grass genomes. In addition, many exceptions to genome microcolinearity such as gene deletion, insertion, duplication, inversion, and translocation were observed (reviewed in BENNETZEN and MA 2003).
Traditional cytological analyses suggested that maize originated from a tetraploid (MCCLINTOCK 1930), while other genetic and molecular data also indicate that the maize genome contains many duplicated genes and duplicated segments with colinear gene arrangements (RHOADES 1951; HELENTJARIS et al. 1988; AHN and TANKSLEY 1993; DAVIS et al. 1999). Some duplicated genes in maize have been isolated and sequenced (GAUT and DOEBLEY 1997; ILIC et al. 2003; LANGHAM et al. 2004; SWIGO
OVá et al. 2004). By examining the patterns of sequence divergence among 14 pairs of duplicated genes in maize, GAUT and DOEBLEY (1997) proposed that the modern maize genome originated from an ancient segmental allotetraploid event that occurred between 16.5 and 11.4 million years ago (MYA) after the divergence of sorghum from one of the two maize diploid progenitor lineages that themselves diverged
20 MYA. However, by analyzing 11 genes from clearly orthologous segments of maize, sorghum, and rice, SWIGO
OVá et al. (2004) determined that the two maize progenitors and sorghum diverged contemporaneously from a common ancestor
11.9 MYA.
In a recent article, ILIC et al. (2003) presented a detailed genomic sequence comparison of an orthologous segment of the rice, sorghum, and two maize subgenomes. This first comparative sequence analysis involving homeologous segments of maize and corresponding colinear regions in sorghum and rice provides numerous insights into the nature and timing of local genomic rearrangements that occurred in these three important grass lineages. ILIC et al. (2003) identified extensive gene loss by an accumulation of small deletions in the two homeologous segments of maize analyzed, and these two segments seem to be equally unstable compared to the orthologous regions of rice and sorghum. The progressive accumulation of small deletions, most caused by illegitimate recombination, also are responsible for rapid loss of retrotransposons, other intergenic space, and some portions of genes (e.g., introns) in the Arabidopsis, wheat, and rice genomes (DEVOS et al. 2002; WICKER et al. 2003; MA et al. 2004; MA and BENNETZEN 2004).
Additional studies are needed to help identify the full spectrum of local genome rearrangement in plants and to determine their frequencies and relative contributions. Here we use comparative sequence analysis to investigate genome structure and change in orthologous Orp regions of maize, sorghum, and rice, thereby uncovering rapid gene movement without gene loss, a hotspot for transposon accumulation, and a propensity for genic rearrangement within a transposon-rich region.
| MATERIALS AND METHODS |
|---|
|
|
|---|
The largest sorghum BAC, SB18C08, that contained an Orp homolog was sequenced and analyzed. The predicted genes in SB18C08 were used as probes to hybridize with the positive maize BAC clones that we previously identified. Two maize BACs, ZM573L14 and ZM573F08, sharing the greatest genic homology with each other and with sorghum BAC SB18C18, were finally chosen for sequencing.
BAC sequencing:
Shotgun libraries for BACs SB18C08, ZM573L14, and ZM573F08 were constructed as described previously (DUBCOVSKY et al. 2001; SONG et al. 2001). Subclones were sequenced from both directions using ABI PRISM BigDye Terminator Chemistry (Applied BioSystems, Foster, CA) and run on an ABI3700 capillary sequencer. Base calling and quality assessment were done using PHRED (EWING and GREEN 1998). Reads were assembled with PHRAP and edited with CONSED (GORDON et al. 1998). Sorghum clone SB18C08 was sequenced at
8-fold redundancy, while maize clones ZM573L14 and ZM573F08 were each sequenced at
12-fold redundancy. Gaps were filled by a combination of several approaches, as described earlier (RAMAKRISHNA et al. 2002a). The final error frequency estimated by CONSED was less than one base/10 kb. The finished assemblies of BAC sequences were found to agree completely with their restriction maps.
The Orp-orthologous regions in two rice subspecies, japonica (c.v. Nipponbare) (http://rgp.dna.affrc.go.jp/IRGSP/) and indica (c.v. 93-11) (YU et al. 2002; ZHAO et al. 2004), were identified by homology comparisons of the genomic sequences in GenBank deposited by May 2004. Sequence alignments were conducted by using BLASTN (NCBI), BLAST 2.0, BLAST2 (TATUSOVA and MADDEN 1999), and CROSS_MATCH (http://www.phrap.org). We considered it an ortholog when a sequence/contig between japonica and indica had a unique match in the japonica genomic sequence and the assembled indica shotgun sequences.
Sequence analysis and annotation:
Gene-finding programs FGENESH (http://www.softberry.com/berry.phtml?topic=gfind&prg=FGENES) with the monocot training set, GeneMark.hmm (http://opal.biology.gatech.edu/GeneMark/eukhmm.cgi) with maize and/or rice training sets, and GENSCAN (http://genes.mit.edu/GENSCAN.html) with the maize training set were used to predict potential genes in rice, sorghum, and maize Orp BAC sequences. The genes predicted by these programs and the remaining regions (excluding the identified transposable elements) of these BAC sequences were investigated by BLASTX searches against the GenBank protein database (http://www.ncbi.nlm.nih.gov/BLAST/). Sequences identified as candidate genes by the gene-finding programs were further investigated to determine whether they were actually genes. In our earlier rice genome annotation studies, for instance, we found that >30% of the candidate genes identified by these programs were actually transposons or transposon fragments (BENNETZEN et al. 2004). So we used conservation in a distantly related species as an additional criterion for gene certification. Hence, candidate rice genes were used as queries in BlastX searches against the full GenBank database, but were considered likely genes only if they detected homology at an expect value of <e05 in some species other than rice. The recent release of the genome sequence for maize (WHITELAW et al. 2003) provided a particularly useful data set for this analysis.
Shared genes were detected by orthologous sequence comparisons and multiple sequence alignments using CROSS_MATCH, BLAST2, and ClustalX (THOMPSON et al. 1997). Genes that were not shared in the orthologous regions were further investigated by BLASTX searches against the Arabidopsis protein database at The Arabidopsis Information Resources (TAIR) (http://www.arabidopsis.org) and against the rice predicted protein database at The Institute for Genomic Research (TIGR) (http://www.tigr.org/tdb/e2k1/osa1/) to determine the copy numbers and distribution of corresponding homologous genes in the Arabidopsis and rice genomes. The genes in sorghum were named in numerical order by their position on the sequenced BAC, while the genes in rice and maize were numbered according to their homology to the shared genes in sorghum. Three unshared genes in rice and one unshared gene in maize were given alphabetical designations.
Transposable elements (transposons and retrotransposons) were identified using a combination of structural analysis of repetitive DNA and homology-based searches against GenBank nucleotide and protein databases and the TIGR cereal repeat database (http://www.tigr.org/tdb/rice/blastsearch.shtml). The programs Repeat and Gap from the Wisconsin Package Version 10.1 (Genetics Computer Group) were used to identify long-terminal-repeat (LTR) retrotransposons as described earlier (DEVOS et al. 2002; MA et al. 2004). Newly identified retrotransposons were named according to the retrotransposon nomenclature previously described by SANMIGUEL et al. (2002). The approximate dates of LTR-retrotransposon insertion and gene duplication in rice were estimated in a manner similar to SANMIGUEL et al. (1998) and RAMAKRISHNA et al. (2002a), respectively. For dating LTR-retrotransposon insertion times, the molecular clock was set at an average substitution rate of 1.3 x 108 mutation/site/year, which we estimated for intergenic regions in rice (MA and BENNETZEN 2004).
| RESULTS |
|---|
|
|
|---|
OVá et al. 2004). Two contiguous series (contigs) of maize BAC clones that hybridized to the Orp probe were generated by fingerprinting and restriction map analysis. Only one contig of BACs that contain an Orp gene was detected in sorghum. Because the maize genome is primarily composed of large blocks of LTR retrotransposons, often organized in a nested insertion pattern (SANMIGUEL et al. 1996; FU and DOONER 2002; SONG et al. 2002; SONG and MESSING 2002, 2003), it was difficult to predict which BACs of maize and sorghum would provide the best alignment of colinear genes. Therefore, we first sequenced an
160-kb sorghum BAC, SB18C08, the largest among the overlapping BACs containing the sorghum Orp gene. After analyzing the gene content of BAC SB18C08, probes from 10 additional genes physically linked to the sorghum Orp gene were obtained by PCR and hybridized with the previously identified positive BACs of maize. Two maize BACs, ZM573F08 and ZM573L14, sharing the most genes with the orthologous region of sorghum, were then chosen and completely sequenced. BLASTN searches against the nonredundant database at GenBank using the predicted genes in sorghum as queries were conducted to identify potential homologous segments of rice. A contig of five overlapping finished BAC sequences (GenBank accession nos. AP003896, AP005620, AP005618, AP005250, and AP004591) from the japonica cultivar Nipponbare were found to contain most of the genes homologous to the genes predicted on sorghum clone SB18C08, defining this contig as an orthologous Orp region in rice. Therefore, a 313-kb contiguous Orp region in rice was selected for further analysis.
Sequence organization of the Orp regions of sorghum, maize, and rice:
The complete sequence of the Orp segment in sorghum clone SB18C08 is 159,669 bp (GenBank accession no. AF466200). With our criteria for gene identification (see MATERIALS AND METHODS), we identified 22 sorghum genes on this BAC (Table 1). The average gene density is one gene/7.3 kb, similar to that previously observed in the sorghum sh2/a1 region (CHEN et al. 1997), the sorghum adh region (TIKHONOV et al. 1999), and the region near the Vrn1 ortholog of sorghum (RAMAKRISHNA et al. 2002a), but higher than the density of one gene/10.8 kb in the 215-kb region comprising the kafirin gene (SONG et al. 2002). No intact transposable elements were annotated in the sorghum region, but two non-LTR retrotransposon fragments (f) and one DNA transposon fragment (TNP2-f) were detected by homology-based searches (Figure 1).
|
|
138 kb of DNA (
76% of the region). The majority of the retrotransposons in this region are organized in typical nested fashion (SANMIGUEL et al. 1996). The four predicted genes are separated into two gene pairs by the largest (
53 kb) retrotransposon block.
The complete sequence of the Orp2 segment in maize clone ZM573L14 is 144,792 bp (GenBank accession no. AY555143) and contains four apparent genes (Table 1). The average gene density is one gene/36 kb. Three of these four genes are clustered together and separated from the other gene by a cluster of intact retrotransposons, retrotransposon fragments, and a newly identified CACTA-like transposon, fanal-1, which inserted into retrotransposon milt-1. The transposable elements on this BAC include six retrotransposons, five retrotransposon fragments, one DNA transposon (fanal-1), and two DNA transposon fragments (TNP-f and Tam3-f), together accounting for
50% of this region.
The 313-kb rice genomic sequence contains 19 identified genes, including a triplication of one locus (genes 12-1, 12-2, and 12-3). The average gene density is 1 gene/16.5 kb, much lower than estimated for the whole rice genome (1 gene/79 kb; FENG et al. 2002; GOFF et al. 2002; SASAKI et al. 2002; SONG et al. 2002; YU et al. 2002; RICE CHROMOSOME 10 SEQUENCING CONSORTIUM 2003). This low gene density is mainly due to the presence of a large cluster of repetitive DNA that harbors only four predicted genes (Figure 1). This repetitive domain is predominantly composed of LTR retrotransposons, including nine intact elements, five solo LTRs, and five truncated fragments, three of which (ifisi, ovikoh, and pawepe) are discovered and named in this study. These elements constitute
105 kb of DNA, accounting for
55% of the retrotransposon-rich area or 33% of the whole region investigated. We also identified four DNA transposons and/or fragments in the rice region, constituting 22 kb of DNA.
Sequence comparison of colinear Orp regions of sorghum, maize, and rice:
Three genes, 3, 8, and 9, are shared among rice, sorghum, and maize Orp1 regions, distributed across 25 kb in rice, 53 kb in sorghum and 68 kb in maize. The maize Orp2 region also shares three predicted genes, 2, 3, and 5, with rice and sorghum. These genes are distributed across 17 kb in rice, 35 kb in sorghum, and 18 kb in maize, respectively. In addition to genes 2, 3, and 5, one more gene (gene 7) is shared between sorghum and the maize Orp2 region. Several genes are missing from one or more of the four otherwise colinear segments (Figure 1). This dramatic variation of gene organization and intergenic distance is due to both the variable amount of intergenic repetitive DNAs and the local genic rearrangements, such as deletions or insertions of genes. The maize Orp1 and Orp2 regions share only gene 3, the Orp loci. Hence, as observed at adh1 and lg2/lrs1 loci (ILIC et al. 2003; LANGHAM et al. 2004), most of the duplicated genes present from the tetraploidization of the maize ancestor
11.9 MYA have been reduced by deletion to a near-diploid state (SWIGO
OVá et al. 2004). The colinearity of the Orp genes and other genes shared between maize and sorghum and between maize and rice (Figure 1) does indicate that the maize Orp1 and Orp2 regions are homeologous segments derived from the two diploid progenitors of maize.
Among the orthologous regions (from gene 2 to gene 9) shared by sorghum, rice, and maize, several gene rearrangements can be attributed to specific lineages because we can compare four chromosomal segments: (1) Gene 5 adjacent to Orp1 was deleted in maize; (2) genes 4 and 6 were inserted into the sorghum region after the divergence of sorghum and maize ancestors; and (3) gene d was acquired by the maize Orp1 region after the divergence of sorghum and maize ancestors (Figure 1). In addition, 3' to Orp1 in maize, genes 1 and 2 were found to be deleted by analyzing the next maize BAC that is downstream of Orp1 and contains the Fie1 locus (LAI et al. 2004).
Extended comparison of colinear Orp regions of rice and sorghum:
The sorghum Orp segment was compared with the continuous 313-kb orthologous region of rice. We found numerous alterations in gene content, order, and orientation. A total of 14 predicted genes were found to be shared, distributed across 313 kb in rice and 159 kb in sorghum, whereas 13 additional genes were not in orthologous locations. This includes 8 genes (1, 4, 6, 7, 13, 15, 16, and 22) present in this region of sorghum but absent in the orthologous region of rice, and 5 genes (a, b, c, 12-2, and 12-3) present only in the rice region. Inspection of the adjacent BACs to the rice contig that we analyzed in this study indicated no copies homologous to genes 1 and 22. For most of the nonorthologous genes, on the basis of comparative analysis of two species we do not know whether they were gained or deleted in sorghum or in rice. However, for genes 4, 6, and 7 (present in sorghum or maize but not in rice), the simplest explanation suggests that they inserted in these locations in the lineage that gave rise to sorghum and/or maize.
An inversion of a cluster of four predicted genes (genes 17, 18, 19, and 20) was detected between rice and sorghum. These four genes are arranged in the indica genome in the same order as present in japonica (Figure 2), but it is not clear whether the inversion occurred in an ancestor of rice or sorghum.
|
25 MYA. However, because only gene 12-1 appears to be intact, the truncated genes 12-2 and 12-3 may be evolving more rapidly than functional loci that usually follow standard molecular clocks. Hence, this first duplication may have occurred much <25 MYA, and, similarly, the second duplication (yielding genes 12-2 and 12-3) may have taken place more recently than the 8 MYA that we calculated. Gene 12-2 is arranged in inverted orientation relative to 12-1 and 12-3 in rice and gene 12 in sorghum, an event that probably occurred after the second duplication. In addition, putative genes a and b were found between genes 12-1 and 12-2 and between genes 12-2 and 12-3, respectively. The extra three genes (a, b, and c) in rice are also truncated. Altogether, these data indicate a high frequency of several different types of genic rearrangement in this specific region of rice.
Chromosomal locations of homologs in rice and Arabidopsis:
Nearly complete genomic sequence and comprehensive sequence annotation of the Arabidopsis and rice genomes allowed us to investigate the nature of some local gene rearrangements at the whole-genome level. All of the genes predicted in the Orp regions of sorghum and/or maize but not shared with the orthologous region of rice were used as queries to search against nucleotide databases and protein databases of the rice genome at TIGR (http://www.tigr.org/tdb/e2k1/osa1/) and the Arabidopsis genome at TAIR (http://www.tigr.org/servlets/sv). The rice and Arabidopsis homologs closest to the corresponding sorghum genes and their chromosomal locations in individual genomes are summarized in Table 2.
|
A rapidly evolving retrotransposon block in rice:
The rice interval contains a transposable element-rich region, composed mainly of LTR retrotransposons (
105 kb of DNA). This regions occupies
190 kb of DNA, but contains only five genes, including two that are duplicated (Figure 1). This segment contains a high percentage (
55%) of LTR retrotransposons, similar to that recently observed in the centromeric region of rice chromosome 8 (WU et al. 2004).
The assembled whole-genome shotgun sequence generated from indica cultivar 93-11 (YU et al. 2002; ZHAO et al. 2004) was used in this study to investigate the timing and lineage specificities of the dramatic accumulation of retrotransposons and genic rearrangements identified in the Orp region of japonica rice. By sequence homology searches and sequence alignments, we identified nine assembled contiguous segments (accession nos. AAAA01000069, AAAA01004112, AAAA01006364, AAAA01008548, AAAA01009470, AAAA01009525, AAAA009834, AAAA01013675, and AAAA01023118) from indica that have unique matches in both the japonica genomic sequence and the indica whole-genome shotgun sequences, suggesting that these segments are orthologous (Figure 2).
We found eight LTR retrotransposons or fragments uniquely present in the Orp region of japonica, although seven LTR retrotransposons or fragments were shared by indica and japonica in the comparable regions (Figure 2). For all LTR retrotransposons that are relatively intact, we employed LTR divergence as a tool to date approximate times of insertion (SANMIGUEL et al. 1998). We found that all intact LTR retrotransposons uniquely present in japonica were younger than 0.44 MY (the estimated divergence time of indica and japonica, MA and BENNETZEN 2004) and that all shared intact elements had inserted >0.44 MYA (Figure 2). Hence, it appears that this retrotransposon block has been continuously and independently expanding in both indica and japonica lineages by insertion of LTR retrotransposons.
The relatively intact LTR retrotransposons found in the maize Orp1 and Orp2 regions are all recent insertions. The majority of intact elements inserted <2 MYA (Figure 3). Our estimate is consistent with the previous dating of LTR retrotransposons in maize (SANMIGUEL et al. 1998; SWIGO
OVá et al. 2004).
|
|
| DISCUSSION |
|---|
|
|
|---|
|
Gene-finding programs such as FGENESH, GENSCAN, and/or Genemark.hmm are useful but imperfect tools for gene identification. These programs predicted 40, 4, 16, and 27 additional genes in the Orp regions of rice and sorghum and the Orp2 and Orp1 regions of maize, respectively, beyond those we consider valid gene candidates (Table 3). Of these predicted genes, 28 (70%), 2 (50%), 15 (94%), and 27 (100%), respectively, were found to have the structure and/or the highest sequence similarity to transposable elements (Table 3). However, 12 predicted genes in rice (11 scattered in the transposon-element-rich area of the Orp segment), 2 predicted genes in sorghum, and 1 predicted gene in maize are unclear in origin, so we did not annotate them as genes. Because these predicted genes have no homologs in Arabidopsis or in any other genome, we think they are rapidly evolving transposable elements or some other nongenic DNA.
Gene content instability in the two maize subgenomes:
Our results are consistent with the hypothesis of a recent tetraploid origin for maize (SWIGO
OVá et al. 2004). Although the two maize segments analyzed in this study share only Orp1 and Orp2 homeologous genes, their comparisons to the orthologous regions of sorghum and rice indicate that they are two homeologous segments. There have been at least two gene deletions near the Orp1 locus. Also, independent insertion of large blocks of retrotransposons in both Orp1 and Orp2 segments have occurred in the last few million years. The maize Orp2 segment remains relatively "intact," with no gene deletion detected in this region. This result parallels the recent finding by LANGHAM et al. (2004). By comparing the maize lg2 region and its homeologous lrs1 region, LANGHAM et al. (2004) found that a cluster of four predicted genes 3' to the lrs1 locus have been deleted, leading to "zero retention" of duplicated factors, excluding the lg2/lrs1 gene pair. In contrast, >40% of the total genes from each homeologous region were found to have been deleted by several separate deletion events in the maize adh1 region and its homeologue, indicating that both regions have been equally unstable compared to their orthologs in sorghum and rice (ILIC et al. 2003). However, at least one copy of all orthologous genes appears to be conserved between the two homeologous regions in all cases investigated, suggesting that natural selection has acted against loss of all copies of any of these genes.
Timing of gene loss in the Orp region of maize:
We cannot precisely determine the times of gene deletion or insertion events in the Orp1 region of maize, although our comparative data indicate that they took place after the divergence of maize and sorghum. The extensive deletion of genes and low-copy-number sequences appears to be a common feature of genomes with polyploid origins, such as Arabidopsis (ARABIDOPSIS GENOME INITIATIVE 2000) and maize (AHN et al. 1993; SONG et al. 2002; ILIC et al. 2003). The elimination of low-copy-number sequences has also been detected in newly formed polyploids (SONG et al. 1995; FELDMAN et al. 1997; OZKAN et al. 2001), indicating that genome changes often happen in the first few generations in response to the formation of a polyploid. Gene deletion and transposon accumulation have also been seen to differentiate haplotypes in the allelic regions of different maize inbreds (FU and DOONER 2002; SONG and MESSING 2003).
Genic rearrangements: Deletion, insertion, and/or translocation?
Comparison of the orthologous regions of the rice and sorghum genomes reveals numerous small genic rearrangements. Apparent insertions of genes 1, 4, and 6 were detected in sorghum compared to rice and maize. No gene deletion or insertion was found in the two gene-clustered regions that are separated by a cluster of transposable elements in rice. This observation parallels observations in adh (TIKHONOV et al. 1999; ILIC et al. 2003), sh2/a1 (CHEN et al. 1997; LI and GILL 2002), and php200725 (SONG et al. 2002) orthologous regions, indicating that rice has a relatively stable gene content and order compared with maize, sorghum, or wheat.
Interestingly, all of the noncolinear genes present in the Orp region of sorghum and/or maize were found to have very similar copy numbers in both rice and Arabidopsis, indicating copy-number conservation for >150 million years of independent evolution (WOLFE et al. 1989). For four noncolinear genes, only single homologs were detected in both rice and Arabidopsis. If one assumes that these single-copy homologs are orthologous to the corresponding genes identified in sorghum, then it is clear that synteny or colinearity is not a perfect indicator of orthology. The relocations of these genes may have occurred in the rice and/or sorghum lineages. Alternatively, these four genes may be paralogous to the corresponding genes detected in rice because deletions removed the actual orthologs. Hence, on the basis of current data it is impossible to say whether these genes were deleted, inserted, or relocated in the rice and sorghum genomes.
All identified genes in the japonica Orp region were found to have homologs in the homologous region of indica rice. Because most assembled shotgun sequences from the indica genome are relatively small, we did not obtain the complete Orp region of indica and thus cannot compare order or orientation of these sequence fragments. It is also not clear whether any genes are uniquely present in the indica region. However, complete japonica and indica sequences of the php200725 region show complete conservation of gene order in both subspecies (SONG et al. 2002). Furthermore, previous comparison of
1.1 Mb of orthologous regions between indica and japonica has demonstrated a lack of gene acquisition or loss from either indica or japonica (MA and BENNETZEN 2004), supporting the previous observation that the rice genome exhibits relatively stable gene content in contrast to the maize genome (SONG et al. 2002; ILIC et al. 2003).
A hotspot for gene rearrangement and the insertion of LTR retrotransposons in rice:
We found a large LTR-retrotransposon-rich segment in the rice genome that contains few genes, and all of the genes within this retrotransposon block were either duplicates or noncolinear inserts relative to sorghum. Our data indicate that this rice region has expanded rapidly by insertion of LTR retrotransposons in the past 2 MY, with most insertions in the few hundred thousand years since the divergence of indica and japonica ancestors. The ancient insertions (>1 MY old) in this region indicate that it has been a hotspot for transposon accumulation for a long time, while the recent insertions suggest that this insertion affinity is still present.
In our dating of relatively intact LTR retrotransposons in the rice genome, we found that the average age is
1.3 MY (MA et al. 2004), while in the retrotransposon block of the rice Orp region, the average age of all datable LTR retrotransposons is
0.7 MY. Moreover, we demonstrated a minimum of eight new transposon insertions within the japonica region since the divergence from a common ancestor with indica, adding at least 53 kb of new DNA to a target region of 190 kb. This is about a fourfold higher frequency of insertion than that observed for 1 Mb of chromosome 4 DNA from our earlier indica and japonica comparison (MA and BENNETZEN 2004).
A high percentage of repetitive DNA was observed in the centromeric region of rice chromosome 8. In this region, LTR retrotransposons account for at least 50% of the DNA (WU et al. 2004), and >80% of these elements were amplified before the divergence of indica and japonica (J. MA and J. L. BENNETZEN, unpublished observations). In contrast to the centromeric region, 55% of the transposon-rich segment of the rice Orp region is composed of LTR retrotransposons, and about half of them were amplified after the divergence of these two subspecies (Figure 2). This high rate of transposon insertion, plus the presence of a nontandem gene triplication and several noncolinear truncated genes in the Orp region, suggests that this block is a hotspot for several different kinds of genome rearrangement. It will be interesting to see if other retrotransposon blocks exhibit this type of exceptional instability when other comparative studies are performed.
An intraelement retrotransposon conversion event:
While maize retrotransposons are frequently intact at their termini, including the presence of short, flanking host-site duplications, there is no shortage of more tattered elements present in any maize BAC sequence. Unequal recombination is frequently invoked to explain the presence of solo LTRs (DEVOS et al. 2002; MA et al. 2004). Here we suggest that this phenomenon may explain a larger group of rearrangements simply by positing that an unequal recombination event initiating inside LTRs might migrate outside a terminus of these LTRs. While Figure 4 depicts a recombination event with symmetric exchange of strands, nonsymmetric events should also occur. These would yield the same outcome. Repair of heteroduplex DNA will also play a role in these sorts of recombination events and this could result in more complex rearrangements than depicted if the repair was noncontinuous over the recombination tract.
One other model could be proposed to explain the structure that we found. Two grande elements (most likely proximate to one another) on the same chromosome in the same orientation could recombine unequally to create a double element, sharing an LTR. But a second event would be required to explain the deletion of the 5'-end of the 5'-element. This second model is also unlikely because the duplicated region resulting from this mechanism would likely have a greater percentage of mismatched bases over the duplication than the 3% that is observed. Comparison of a grande element from the 22-kD
-zein gene family (SONG et al. 2001) and this grande element yields a 15% mismatch frequency over aligned bases. Rarely are retrotransposons (even from the same family) >90% similar over >2-kb regions.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
ová for useful discussions and two anonymous reviewers for their valuable comments. This work was supported by the National Science Foundation Plant Genome Program (grant no. 9975618). | FOOTNOTES |
|---|
| LITERATURE CITED |
|---|
|
|
|---|
AHN, S., and S. D. TANKSLEY, 1993 Comparative linkage maps of the rice and maize genomes. Proc. Natl. Acad. Sci. USA 90: 79807984.
AHN, S., J. A. ANDERSON, M. E. SORRELLS and S. D. TANKSLEY, 1993 Homoeologous relationships of rice, wheat and maize chromosomes. Mol. Gen. Genet. 241: 483490.[CrossRef][Medline]
ARABIDOPSIS GENOME INITIATIVE, 2000 Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796815.[CrossRef][Medline]
BENNETZEN, J. L., and J. MA, 2003 The genetic collinearity of rice and other cereals on the basis of genomic sequence analysis. Curr. Opin. Plant Biol. 6: 128133.[CrossRef][Medline]
BENNETZEN, J. L., C. COLEMAN, R. LIU, J. MA and W. RAMAKRISHNA, 2004 Consistent over-estimation of gene number in complex plant genomes. Curr. Opin. Plant Biol. 7: 732736.[CrossRef][Medline]
BENNETZEN, J. L., J. MA and K. M. DEVOS, 2005 Mechanisms of recent genome size variation in flowering plants. Ann. Bot. 95: 127132.
BRUEGGEMAN, R., N. ROSTOKS, D. KUDRNA, A. KILIAN, F. HAN et al., 2002 The barley stem rust-resistance gene Rpg1 is a novel disease-resistance gene with homology to receptor kinases. Proc. Natl. Acad. Sci. USA 99: 93289333.
BRUNNER, S., B. KELLER and C. FEUILLET, 2003 A large rearrangement involving genes and low-copy DNA interrupts the microcollinearity between rice and barley at the Rph7 locus. Genetics 164: 673683.
CHEN, M., P. J. SANMIGUEL, A. C. DE OLIVEIRA, S. S. WOO, H. ZHANG et al., 1997 Microcollinearity in sh2-homologous regions of the maize, rice, and sorghum genomes. Proc. Natl. Acad. Sci. USA 94: 34313435.
DAVIS, G. L., M. D. MCMULLEN, C. BAYSDORFER, T. MUSKET, D. GRANT et al., 1999 A maize map standard with sequenced core markers, grass genome reference points and 932 expressed sequence tagged sites (ESTs) in a 1736-locus map. Genetics 152: 11371172.
DEVOS, K. M., J. K. BROWN and J. L. BENNETZEN, 2002 Genome size reduction through illegitimate recombination counteracts genome expansion in Arabidopsis. Genome Res. 12: 10751079.
DOONER, H. K., and I. M. MARTINEZ-FEREZ, 1997 Recombination occurs uniformly within the bronze gene, a meiotic recombination hotspot in the maize genome. Plant Cell 9: 16331646.[Abstract]
DUBCOVSKY, J., W. RAMAKRISHNA, P. J. SANMIGUEL, C. S. BUSSO, L. YAN et al., 2001 Comparative sequence analysis of colinear barley and rice bacterial artificial chromosomes. Plant Physiol. 125: 13421353.
EWING, B., and P. GREEN, 1998 Base-calling of automated sequencer traces using PHRED: II. Error probabilities. Genome Res. 8: 186194.
FELDMAN, M., B. LIU, G. SEGAL, S. ABBO, A. A. LEVY et al., 1997 Rapid elimination of low-copy DNA sequences in polyploid wheat: a possible mechanism for differentiation of homoeologous chromosomes. Genetics 147: 13811387.[Abstract]
FENG, Q., Y. ZHANG, P. HAO, S. WANG, G. FU et al., 2002 Sequence and analysis of rice chromosome 4. Nature 420: 316320.[CrossRef][Medline]
FEUILLET, C., and B. KELLER, 1999 High gene density is conserved at syntenic loci of small and large grass genomes. Proc. Natl. Acad. Sci. USA 96: 82658270.
FEUILLET, C., S. TRAVELLA, N. STEIN, L. ALBAR, A. NUBLAT et al., 2003 Map-based isolation of the leaf rust disease resistance gene Lr10 from the hexaploid wheat (Triticum aestivum L.) genome. Proc. Natl. Acad. Sci. USA 100: 1525315258.
FU, H., and H. K. DOONER, 2002 Intraspecific violation of genetic colinearity and its implications in maize. Proc. Natl. Acad. Sci. USA 99: 95739578.
GALE, M. D., and K. M. DEVOS, 1998 Comparative genetics in the grasses. Proc. Natl. Acad. Sci. USA 95: 19711974.
GAUT, B. S., and J. F. DOEBLEY, 1997 DNA sequence evidence for the segmental allotetraploid origin of maize. Proc. Natl. Acad. Sci. USA 94: 68096814.
GOFF, S. A., D. RICKE, T.-H. LAN, G. PRESTING, R. WANG et al., 2002 A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296: 92100.
GORDON, D., C. ABAJIAN and P. GREEN, 1998 CONSED: a graphical tool for sequence finishing. Genome Res. 8: 195202.
GUO, H., and S. P. MOOSE, 2003 Conserved noncoding sequences among cultivated cereal genomes identify candidate regulatory sequence elements and patterns of promoter evolution. Plant Cell 15: 11431158.
HELENTJARIS, T., D. WEBER and S. WRIGHT, 1988 Identification of the genomic locations of duplicate nucleotide sequences in maize by analysis of restriction fragment length polymorphisms. Genetics 118: 353363.
ILIC, K., J. P. SANMIGUEL and J. L. BENNETZEN, 2003 A complex history of rearrangement in an orthologous region of the maize, sorghum and rice genomes. Proc. Natl. Acad. Sci. USA 100: 1226512270.
INADA, D. C., A. BASHIR, C. LEE, B. C. THOMAS and C. KO, 2003 Conserved noncoding sequences in the grasses. Genome Res. 13: 20302041.
KAPLINSKY, N. J., D. M. BRAUN, J. PENTERMAN, S. A. GOFF and M. FREELING, 2002 Utility and distribution of conserved noncoding sequences in the grasses. Proc. Natl. Acad. Sci. USA 99: 61476151.
KELLOGG, E. A., 2001 Evolutionary history of the grasses. Plant Physiol. 125: 11981205.
LAI, J., J. MA, Z. SWIGONOVA, W. RAMAKRISHNA, E. LINTON et al., 2004 Gene loss and movement in the maize genome. Genome Res. 14: 19241931.
LANGHAM, R. J., J. WALSH, M. DUNN, C. KO, S. A. GOFF et al., 2004 Genomic duplication, fractionation and the origin of regulatory novelty. Genetics 166: 935945.
LI, W., and B. S. GILL, 2002 The colinearity of the Sh2/A1 orthologous region in rice, sorghum and maize is interrupted and accompanied by genome expansion in the Triticeae. Genetics 160: 11531162.
MA, J., and J. L. BENNETZEN, 2004 Recent rapid growth and divergence of the rice nuclear genome. Proc. Natl. Acad. Sci. USA 101: 1240412410.
MA, J., K. M. DEVOS and J. L. BENNETZEN, 2004 Analyses of LTR-retrotransposon structures reveal recent and rapid genomic DNA loss in rice. Genome Res. 14: 860869.
MCCLINTOCK, B., 1930 A cytological demonstration of the location of an interchange between two non-homologous chromosomes of Zea mays. Proc. Natl. Acad. Sci. USA 16: 791796.
OZKAN, H., A. A. LEVY and M. FELDMAN, 2001 Allopolyploidy-induced rapid genome evolution in the wheat (Aegilops-Triticum) group. Plant Cell 13: 17351747.
PATERSON, A. H., J. E. BOWERS, M. D. BUROW, X. DRAYE, C. G. ELSIK et al., 2000 Comparative genomics of plant chromosomes. Plant Cell 12: 15231540.
RAMAKRISHNA, W., J. DUBCOVSKY, Y.-J. PARK, C. BUSSO, J. EMBERTON et al., 2002a Different types and rates of genome evolution detected by comparative sequence analysis of orthologous segments from four cereal genomes. Genetics 162: 13891400.
RAMAKRISHNA, W., J. EMBERTON, P. SANMIGUEL, M. OGDEN, V. LLACA et al., 2002b Comparative sequence analysis of the sorghum Rph region and the maize Rp1 resistance gene complex. Plant Physiol. 130: 17281738.
RHOADES, M. M., 1951 Duplicated genes in maize. Am. Nat. 85: 105110.[CrossRef]
RICE CHROMOSOME 10 SEQUENCING CONSORTIUM, 2003 In-depth view of structure, activity, and evolution of rice chromosome 10. Science 300: 15661569.
SANMIGUEL, P., A. TIKHONOV, Y. K. JIN, N. MOTCHOULSKAIA, D. ZAKHAROV et al., 1996 Nested retrotransposons in the intergenic regions of the maize genome. Science 274: 765768.
SANMIGUEL, P., B. S. GAUT, A. TIKHONOV, Y. NAKAJIMA and J. L. BENNETZEN, 1998 The paleontology of intergene retrotransposons of maize. Nat. Genet. 20: 4345.[CrossRef][Medline]
SANMIGUEL, P. J., W. RAMAKRISHNA, J. L. BENNETZEN, C. S. BUSSO and J. DUBCOVSKY, 2002 Transposable elements, genes and recombination in a 215-kb contig from wheat chromosome 5A(m). Funct. Integr. Genomics 2: 7080.[CrossRef][Medline]
SASAKI, T., T. MATSUMOTO, K. YAMAMOTO, K. SAKATA, T. BABA et al., 2002 The genome sequence and structure of rice chromosome 1. Nature 420: 312316.[CrossRef][Medline]
SONG, K., P. LU, K. TANG and T. C. OSBORN, 1995 Rapid genome change in synthetic polyploids of Brassica and its implications for polyploid evolution. Proc. Natl. Acad. Sci. USA 92: 77197723.
SONG, R., and J. MESSING, 2002 Contiguous genomic DNA sequence comprising the 19-kD zein gene family from maize. Plant Physiol. 130: 16261635.
SONG, R., and J. MESSING, 2003 Gene expression of a gene family in maize based on noncollinear haplotypes. Proc. Natl. Acad. Sci. USA 100: 90559060.
SONG, R., V. LLACA, E. LINTON and J. MESSING, 2001 Sequence, regulation, and evolution of the maize 22-kD alpha zein gene family. Genome Res. 11: 18171825.
SONG, R., V. LLACA and J. MESSING, 2002 Mosaic organization of the orthologous sequences in grass genomes. Genome Res. 12: 15491555.
SWIGO
OVá, Z., J. LAI, J. MA, W. RAMAKRISHNA, V. LLACA et al., 2004 Close split of maize and sorghum genome progenitors. Genome Res. 14: 19161923.
SWIGO
OVá, Z., J. L. BENNETZEN and J. MESSING, 2005 Structure and evolution of the r/b chromosomal regions in rice, maize and sorghum. Genetics 169: 891906.
TATUSOVA, T. A., and T. L. MADDEN, 1999 Blast 2 sequencesa new tool for comparing protein and nucleotide sequences. FEMS Microbiol. Lett. 174: 247250.[CrossRef][Medline]
THOMPSON, J. D., T. J. GIBSON, F. PLEWNIAK, F. JEANMOUGIN and D. G. HIGGINS, 1997 The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 24: 48764882.
TIKHONOV, A. P., P. J. SANMIGUEL, Y. NAKAJIMA, N. M. GORENSTEIN, J. L. BENNETZEN et al., 1999 Colinearity and its exceptions in orthologous adh regions of maize and sorghum. Proc. Natl. Acad. Sci. USA 96: 74097414.
WHITELAW, C. A., W. B. BARBAZUK, G. PERTEA, A. P. CHAN, F. CHEUNG et al., 2003 Enrichment of gene-coding sequences in maize by genome filtration. Science 302: 21182120.
WICKER, T., N. YAHIAOUI, R. GUYOT, E. SCHLAGENHAUF, Z. D. LIU et al., 2003 Rapid genome divergence at orthologous low molecular weight glutenin loci of the A and Am genomes of wheat. Plant Cell 15: 11861197.
WOLFE, K. H., M. GOUY, Y. W. YANG, P. M. SHARP and W. H. LI, 1989 Date of the monocot-dicot divergence estimated from chloroplast DNA sequence data. Proc. Natl. Acad. Sci. USA 86: 62016205.
WRIGHT, A. D., C. A. MOEHLENKAMP, G. H. PERROT, M. G. NEUFFER and K. C. CONE, 1992 The maize auxotrophic mutant orange pericarp is defective in duplicate genes for tryptophan synthase beta. Plant Cell 4: 711719.
WU, J., H. YAMAGATA, M. HAYASHI-TSUGANE, S. HIJISHITA, M. FUJISAWA et al., 2004 Composition and structure of the centromeric region of rice chromosome 8. Plant Cell 16: 967976.
YAHIAOUI, N., P. SRICHUMPA, R. DUDLER and B. KELLER, 2004 Genome analysis at different ploidy levels allows cloning of the powdery mildew resistance gene Pm3b from hexaploid wheat. Plant J. 37: 528538.[CrossRef][Medline]
YAN, L., A. LOUKOIANOV, G. TRANQUILLI, M. HELGUERA, T. FAHIMA et al., 2003 Positional cloning of the wheat vernalization gene VRN1. Proc. Natl. Acad. Sci. USA 100: 62636268.
YAN, L., A. LOUKOIANOV, A. BLECHL, G. TRANQUILLI, W. RAMAKRISHNA et al., 2004 The wheat VRN2 gene is a flowering repressor down-regulated by vernalization. Science 303: 16401644.
YANDEAU-NELSON, M. D., Q. ZHOU, H. YAO, X. XU, B. J. NIKOLAU et al., 2005 MuDR Transposase increases the frequency of meiotic crossovers in the vicinity of a Mu insertion in the maize a1 gene. Genetics 169: 917929.
YIM, Y. S., G. L. DAVIS, N. A. DURU, T. A. MUSKET, E. W. LINTON et al., 2002 Characterization of three maize bacterial artificial chromosome libraries toward anchoring of the physical map to the genetic map using high-density bacterial artificial chromosome filter hybridization. Plant Physiol. 130: 16861696.
YU, J., S. N. HU, J. WANG, K. W. GANE, S. G. LI et al., 2002 A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296: 7992.
ZHAO, W. M., J. WANG, X. HE, X. HUANG, Y. JIAO et al., 2004 BGI-RIS: an integrated information resource and comparative analysis workbench for rice genomes. Nucleic Acids Res. 32: 377382.
Communicating editor: J. A. BIRCHLERThis article has been cited by other articles: