Genetics, Vol. 163, 759-770, February 2003, Copyright © 2003

Molecular and Cytological Analyses of Large Tracks of Centromeric DNA Reveal the Structure and Evolutionary Dynamics of Maize Centromeres

Kiyotaka Nagakia, Junqi Songa, Robert M. Stupara, Alexander S. Parokonnya, Qiaoping Yuanb, Shu Ouyangb, Jia Liub, Joseph Hsiaob, Kristine M. Jonesb, R. Kelly Dawec, C. Robin Buellb, and Jiming Jianga
a Department of Horticulture, University of Wisconsin, Madison, Wisconsin 53706,
b The Institute for Genomic Research, Rockville, Maryland 20850
c Department of Plant Biology, University of Georgia, Athens, Georgia 30602

Corresponding author: Jiming Jiang, 1575 Linden Dr., University of Wisconsin, Madison, WI 53706., jjiang1{at}facstaff.wisc.edu (E-mail)

Communicating editor: V. SUNDARESAN


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

We sequenced two maize bacterial artificial chromosome (BAC) clones anchored by the centromere-specific satellite repeat CentC. The two BACs, consisting of ~200 kb of cytologically defined centromeric DNA, are composed exclusively of satellite sequences and retrotransposons that can be classified as centromere specific or noncentromere specific on the basis of their distribution in the maize genome. Sequence analysis suggests that the original maize sequences were composed of CentC arrays that were expanded by retrotransposon invasions. Seven centromere-specific retrotransposons of maize (CRM) were found in BAC 16H10. The CRM elements inserted randomly into either CentC monomers or other retrotransposons. Sequence comparisons of the long terminal repeats (LTRs) of individual CRM elements indicated that these elements transposed within the last 1.22 million years. We observed that all of the previously reported centromere-specific retrotransposons in rice and barley, which belong to the same family as the CRM elements, also recently transposed with the oldest element having transposed ~3.8 million years ago. Highly conserved sequence motifs were found in the LTRs of the centromere-specific retrotransposons in the grass species, suggesting that the LTRs may be important for the centromere specificity of this retrotransposon family.


THE centromeres of eukaryotic chromosomes are responsible for sister chromatid cohesion and serve as the sites for kinetochore assembly and spindle fiber attachment during cell division. Thus, centromeres are critical for the segregation and transmission of genetic information. In the budding yeast Saccharomyces cerevisiae, the functional centromere is defined by a ~125-bp sequence (CLARKE 1998 Down). However, in the majority of eukaryotic species, centromeres are embedded in long tracks of highly repetitive DNA sequences with satellite repeats often the major DNA component of centromeres in higher eukaryotic species (CSINK and HENIKOFF 1998 Down). For example, a 171-bp tandem repeat, the {alpha}-satellite, is located in the centromeres of all human chromosomes. Human artificial chromosomes have been successfully assembled using either synthetic or cloned {alpha}-satellite DNA as the centromere component (HARRINGTON et al. 1997 Down; IKENO et al. 1998 Down; HENNING et al. 1999 Down), suggesting that a long stretch of {alpha}-satellite DNA can act as a functional human centromere.

The centromeres of Arabidopsis thaliana chromosomes are among the most well-studied plant centromeres. A. thaliana centromeres were mapped genetically using tetrad-based genetic mapping (COPENHAVER et al. 1999 Down). DNA sequences within the genetically mapped centromeres were cloned and analyzed (COPENHAVER et al. 1999 Down; ARABIDOPSIS GENOME INITIATIVE 2000; KUMEKAWA et al. 2000 Down, KUMEKAWA et al. 2001 Down). The most abundant DNA element in A. thaliana centromeres is the pAL1 repeat, a 180-bp satellite repeat family (MARTINEZ-ZAPATER et al. 1986 Down; MALUSZYNSKA and HESLOP-HARRISON 1991 Down; ROUND et al. 1997 Down). The cytological locations of the pAL1 repeat coincide with the centromeric H3 histone (TALBERT et al. 2002 Down). The pAL1 repeat is organized into long tandem arrays (JACKSON et al. 1998 Down) that may be interrupted by the 106B repeat, a diverged copy of the long terminal repeat (LTR) of the Athila retrotransposon (FRANSZ et al. 2000 Down). The Athila element, the most dominant retrotransposon family in A. thaliana, and a number of other repetitive DNA elements are highly enriched in pericentromeric regions of all five A. thaliana centromeres (FRANSZ et al. 2000 Down; KUMEKAWA et al. 2000 Down, KUMEKAWA et al. 2001 Down).

Two highly conserved repetitive DNA elements were reported in centromeres of grass species (ARAGON-ALCAIDE et al. 1996 Down; JIANG et al. 1996 Down). These two sequences are derived from a Ty3/gypsy class of retrotransposon (MILLER et al. 1998A Down; PRESTING et al. 1998 Down; LANGDON et al. 2000 Down). The centromere-specific retrotransposon sequences provide excellent probes to isolate DNA clones derived from grass centromeres. Such clones have been reported in a number of plant species, including rice (DONG et al. 1998 Down; NONOMURA and KURATA 1999 Down), barley (PRESTING et al. 1998 Down), and maize (ANANIEV et al. 1998 Down). DNA sequences associated with centromeric regions have also been reported in numerous other plant species (HARRISON and HESLOP-HARRISON 1995 Down; MILLER et al. 1998B Down; NAGAKI et al. 1998 Down; FRANCKI 2001 Down; GINDULLIS et al. 2001 Down; HUDAKOVA et al. 2001 Down; KISHII et al. 2001 Down; SAUNDERS and HOUBEN 2001 Down).

Maize has become an important model for plant centromere research. ALFENITO and BIRCHLER 1993 Down isolated a repetitive DNA element that is specific to the centromeres of maize B chromosomes. This repeat is present in all significantly rearranged B centromeres (KASZAS and BIRCHLER 1996 Down, KASZAS and BIRCHLER 1998 Down), suggesting that it is essential for B centromere function. A repetitive DNA element was recently isolated from the centromere of maize chromosome 4 on the basis of its partial sequence homology with the B centromeric repeat (PAGE et al. 2001 Down). Cosmid clones derived from the centromeric region of maize chromosome 9 were identified in a library constructed from an oat-maize chromosome 9 addition line (ANANIEV et al. 1998 Down). A 156-bp satellite repeat, CentC, was discovered from these cosmid clones. CentC is found only at maize centromeres, but the amount of CentC repeat is highly variable among the 10 maize centromeres (ANANIEV et al. 1998 Down).

Although several DNA elements have been isolated from the maize centromeres, the large-scale organization of maize centromeric DNA, especially in the A chromosomes, is not known. In this study, we isolated and sequenced two maize bacterial artificial chromosome (BAC) clones derived from the centromeric regions. We found that the CentC satellite and retrotransposons, both centromere specific and noncentromere specific, are the primary DNA components of maize centromeres. Molecular and cytological analyses of the centromere-specific retrotransposons in maize and other cereal species revealed the structural diversity and evolutionary dynamics of this special retrotransposon family that may play an important role in grass centromere evolution.


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

BAC library construction and screening:
A BAC library was constructed from maize inbred line Mo17 according to SONG et al. 2000 Down. The BamHI cloning site of vector pBeloBAC11 (SHIZUYA et al. 1992 Down) was used for library construction. The 9216 clones were placed on 24 384-well plates. Filter preparation and library screening were according to published protocols (NIZETIC et al. 1990 Down). DNA sequences homologous to the maize centromeric repeats CentC and CentA (ANANIEV et al. 1998 Down) were amplified from maize genomic DNA and cloned into plasmid vectors. Two plasmid clones, pCentA-int and pCentC-1, were used to screen the BAC library.

Fluorescent in situ hybridization:
Maize inbred line Mo17 was used for cytological analysis. The fluorescence in situ hybridization (FISH) procedures on metaphase chromosomes and individual BAC molecules were essentially the same as previously published protocols (JIANG et al. 1995 Down; JACKSON et al. 1999 Down). All images were captured digitally using a SenSys charge-coupled device (CCD) camera (Roper Scientific, Tucson, AZ) attached to an Olympus BX60 epifluorescence microscope. The camera control and image analysis were performed using IPLab Spectrum v3.1 software (Signal Analytics, Vienna, VA).

Polymerase chain reaction:
To detect each subfamily of the centromere-specific retrotransposons in maize, primers specific to each subfamily were designed for the 5' LTR and 5' untranslated region (UTR). Primers include CRM1a-U (5'-ACACCAGCAGCACCTTCTCCAG-3'), CRM1a-L (5'-AGTTCTTATCCGTTCTTACCAA-3'), CRM2a-U (5'-GCTCGTCAACTCAACCATCAGG-3'), and CRM2a-L (5'-GCCCCATCTTTTCATTCGTCAC-3'). Two primers were designed to amplify the 77-bp repeat discovered in BAC 15C5: ZMA77-U (5'-TTTTGCACGGATAGTCTTCG-3') and ZMA77-L (5'-TCCGTGCAAAAGTCGCCTAA-3'). The specific regions were amplified from the genomic DNA of Mo17 by 30 cycles of polymerase chain reaction (PCR) with the following conditions: 94° for 30 sec, 52° for 30 sec, and 72° for 2 min.

DNA sequencing:
The sequences of the two maize BAC clones, 15C5 and 16H10, were determined essentially as described by YUAN et al. 2002 Down. For 15C5, a 2- to 3-kb and a 10- to 15-kb shotgun library were constructed and these libraries were sequenced to provide a total of ~14x sequence coverage. For 16H10, a 2- to 3-kb and a 4- to 8-kb shotgun library were constructed and sequenced to provide >10x sequence coverage. Shotgun sequences for each BAC were assembled using TIGR assembler (SUTTON et al. 1995 Down). Closure reactions were performed on the BACs using a combination of resequencing, alternative chemistries, transposon-based sequencing, and primer walking. Some of the assemblies could be ordered on the basis of clone mate pairs and the presence of the BAC vector. The sequences have been submitted to GenBank with accession nos. AC116034 (BAC 16H10) and AC116033 (BAC 15C5).

Sequence analysis:
DNA sequences similar to the BAC assemblies were searched in the GenBank database using BLASTN. DNA elements in the sequences were analyzed by MegAlign software (DNASTAR, Madison, WI). The ages of the retrotransposons discovered in the two maize BACs were estimated by sequence comparison between the two LTRs of the elements. The LTRs were first aligned by CLUSTAL X v1.81 software (THOMPSON et al. 1997 Down). Kimura's distance (KIMURA 1980 Down) of the two LTRs of individual retrotransposons was estimated by the maximum-likelihood method using the baseml program with the K80 model in the PAML 3.11 PPC package (YANG 1997 Down). The reported substitution rate per synonymous site per year in maize and Kimura's distances were then used to estimate the age of the elements (GAUT et al. 1996 Down). The phylogeny of the retrotransposons in the BACs was analyzed by the neighbor-joining method with CLUSTAL X v1.81 software (SAITOU and NEI 1987 Down; THOMPSON et al. 1997 Down).


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Isolation of centromeric BACs for sequencing:
We constructed a BAC library of maize inbred line Mo17, which consists of 9216 clones with an average insert size of 120 kb. Two plasmid clones, pCentA-int and pCentC-1, were used as probes to identify centromeric clones from the BAC library. Probe pCentC-1 contains a 156-bp satellite DNA element CentC that is specific to the centromeres of maize chromosomes (ANANIEV et al. 1998 Down). Probe pCentA-int is derived from a portion of the centromere-specific retrotransposon sequence CentA that is almost exclusively located in the centromeric regions of maize chromosomes (ANANIEV et al. 1998 Down). BAC library screening using these two probes identified a total of 96 positive clones, including 18 specific to CentA, 64 specific to CentC, and 14 identified by both probes.

Two BAC clones, 16H10 and 15C5, were selected for further analysis. BACs 16H10 and 15C5 contain inserts of 95 and 100 kb, respectively, based on fingerprint analyses using both NotI and BamHI digestions (data not shown). FISH analysis on maize metaphase chromosomes showed that the signals derived from 16H10 were almost exclusively localized in the centromeres (Fig 1, A–C). Major FISH signals from 15C5 were also located in the centromeres. However, faint signals uniformly covered the entire length of all maize chromosomes (Fig 1, D–F). The amount and location of the CentC sequences in the two BAC clones were determined by FISH mapping on individual BAC molecules as described by JACKSON et al. 1999 Down. The average sizes of the CentC tracts were calculated from 10 FISH images. BAC 16H10 contains three CentC tracts, and the sizes of the tracts are 18.0, 2.4, and 1.8% of the BAC molecule (including the vector), respectively (Fig 2A and Fig C). BAC 15C5 also contains three CentC tracts, and sizes of the tracts are 6.1, 1.5, and 1.7% of the BAC molecule, respectively (Fig 2B and Fig D).



View larger version (14K):
In this window
In a new window
Download PPT slide
 
Figure 1. FISH mapping of centromeric BACs 16H10 (A–C) and 15C5 (D–F) on somatic metaphase chromosomes of maize inbred Mo17. (A and D) Somatic metaphase chromosomes; (B and E) FISH signals; (C and F) merged images. Chromosomes are stained by 4',6-diamidino-2-phenylindole (DAPI) and presented by a pseudo-red color. Bars, 5 µm.



View larger version (30K):
In this window
In a new window
Download PPT slide
 
Figure 2. Structure of maize BACs 16H10 and 15C5 revealed by fiber-FISH mapping. DNA from BACs 16H10 (A) and 15C5 (B) was labeled as green and pCentC-1 was labeled as red (bars, 5 µm). The amount and locations of the CentC sequences within the BAC inserts were revealed by this method and are illustrated in C and D.

Sequence analysis of BAC clone 16H10:
BAC 16H10 was sequenced to >10x sequence coverage (see MATERIALS AND METHODS). The sequences generated from 16H10 were assembled into two large contigs (34,079 and 21,043 bp, respectively) and eight small contigs (9438, 4686, 3066, 2491, 2143, 1904, 1494, and 981 bp, respectively). The total length of these 10 contigs is 81,325 bp, slightly smaller than the 95 kb estimated by fingerprint analysis, suggesting that a portion of the highly conserved repetitive sequences within the BAC were collapsed within the contigs. However, a substantial portion of the 81-kb assembled sequence (74.8 kb) was correctly assembled as determined by inspection of clone mates and use of transposon-based sequencing of the large insert shotgun clones. The order of the contigs in Fig 3 is determined on the basis of structure and locations of specific retroelements within the BAC insert and the presence of the BAC vector. Both large contigs (ASM 37376, 34,079 bp; and ASM 37375, 21,043 bp) and 4 of the 8 small contigs could be placed within the BAC insert using this approach (ASM 37379, 9438 bp; ASM 37381, 4686 bp; ASM 37378, 3066 bp; and ASM 37606, 981 bp; Fig 3).



View larger version (28K):
In this window
In a new window
Download PPT slide
 
Figure 3. Sequence organization of maize centromeric BACs 16H10 and 15C5. The order of the sequence contigs in 16H10 was determined on the basis of the sequence information of specific retroelements within the BAC insert and the presence of the BAC vector. Each retrotransposon is marked by a different color. The name, LTRs, and polyprotein of the same element are in the same color to facilitate the identification of interrupted retrotransposons.

Four CentC tracts were found in 16H10 and were named as CentC tracts A1, A2, B, and C, respectively (Fig 3). The total length of CentC tracts A1 and tract A2, including the gap separating these two tracts, was determined to be ~25 kb by restriction digestions followed by Southern hybridization (data not shown), suggesting an ~12-kb gap separating ASM 37375 and ASM 37379 (Fig 3). Nine retrotransposons were found in 16H10, including seven elements homologous to the centromeric retrotransposon of rice (CRR; CHENG et al. 2002 Down) element (Table 1 and Fig 3). The CRR-like elements in maize were named centromeric retrotransposon of maize (CRM) thereafter. Six CRM elements, including CRM1a, CRM1b, CRM1c, CRM2a, CRM2b, and CRM2c, are complete or near-complete elements. The seventh CRM element is a solo LTR inserted in the middle of CentC tract C (Fig 3).


 
View this table:
In this window
In a new window

 
Table 1. Retrotransposons in the two sequenced centromeric BACs of maize

The two non-CRM elements include a Huck1 element and a nonautonomous retroelement that is novel and different from any published maize retrotransposon families. We named this a Novl element. A shotgun clone containing sequences derived from the Novl element was used as a probe for FISH analysis (Fig 4A and Fig B). Dispersed signals were observed from the probe, indicating that the Novl element is not specific to the centromeres. The last CRM element, CRM2c, is located between CentC tract C and the BAC vector. A solo LTR, which is most likely derived from a different CRM element, is found in the middle of CentC track C (Fig 3).



View larger version (27K):
In this window
In a new window
Download PPT slide
 
Figure 4. FISH analysis using shotgun plasmid clones from maize BACs 16H10 and 15C5. The locations of plasmids within the BAC inserts are marked in Fig 3. (A and B) FISH pattern of plasmid ZMACL26 derived from retrotransposon Novl. (C and D) Chromosomal locations of the ZMA77bp tandem repeat. This repeat was amplified from maize genomic DNA using primers ZMA77-U and ZMA77-L (see MATERIALS AND METHODS). The PCR product was labeled as a FISH probe that hybridized almost uniformly to the chromosomes although enhanced pericentromeric signals were observed in some chromosomes. (E and F) FISHpattern of plasmid ZMABC19 derived from possibly decayed retrotransposon sequences in BAC 15C5. (G and H) FISH pattern of plasmid ZMABC91 derived from possibly decayed retrotransposon sequences in BAC 15C5. (I and J) FISH pattern of plasmid ZMACD69 derived from CRM1c. (K and L) FISH pattern of plasmid ZMACD68 derived from CRM2a. Chromosomes are stained by DAPI and presented by a pseudo-red color. Bars, 5 µm.

Sequence analysis of BAC clone 15C5:
The sequences generated from 15C5 were assembled into a single contig with a length of 99,979 bp, which is consistent with the estimated size of 100 kb based on fingerprint analyses.

Three CentC tracts, named D, E, and F, were found in 15C5. A total of 15 retrotransposons were discovered in 15C5 (Table 1 and Fig 3), including two complete Cinful2-like elements and one complete Zeon1 element. The remaining 12 retrotransposons have significantly decayed and their structures were difficult to determine. A novel 77-bp tandem repeat was found in BAC 15C5 (Fig 3). Two primers, ZMA77-U and ZMA77-L (see MATERIALS AND METHODS), were designed to amplify this repeat from maize genomic DNA and the PCR product was labeled as a probe for FISH analysis. Dispersed FISH signals were observed on maize metaphase chromosomes, indicating that this repeat is not specific to maize centromeres (Fig 4C and Fig D).

Several regions within BAC 15C5 did not show any homology with known repeats or transposons within GenBank. Shotgun clones derived from these regions were used as FISH probes, and they all generated dispersed signals that are enriched in the pericentromeric regions (Fig 4, E–H), suggesting that much of the novel sequence is composed of degenerated retrotransposons.

Phylogenic analysis of the centromere-specific retrotransposons:
Ty3/gypsy-type retrotransposons similar to those in the CRM family have been found in the centromeric regions of all grass chromosomes (MILLER et al. 1998A Down; PRESTING et al. 1998 Down; LANGDON et al. 2000 Down). These centromeric retrotransposons in grass species (referred to as CR elements) can be divided into "autonomous" and "nonautonomous" subfamilies (LANGDON et al. 2000 Down). The autonomous CR elements are full-size elements. The nonautonomous CR elements have an internal deletion leading to the loss of all enzymatic functions, resulting in the retrotransposons having only LTRs, a 5' UTR, and a gag structural gene fragment, truncated before the canonical RNA-binding motif (LANGDON et al. 2000 Down).

A number of CR elements from rice, maize, and barley were used in phylogenic analysis. These CR elements were described in previous reports or were directly deposited in GenBank (Table 2). The polyprotein regions from autonomous CR elements and two typical Ty3/gypsy retrotransposons of rice (RIRE3) and maize (Huck2) were analyzed by the neighbor-joining method (Fig 5A). Consistent with previous data (LANGDON et al. 2000 Down) we found that the CR elements formed a cluster distinct from other rice and maize Ty3/gypsy elements (Fig 5A). This CR cluster can be divided into five species-specific subclusters. The maize sequences fall into two of these subclusters. CRM1a, -1b, and -1c fall into one subcluster, while the second maize subcluster, including CRM2a, -2b, and -2c, is more closely related to one of the two rice subclusters (Fig 5A). Our FISH data showed that the elements in both subclusters are centromere specific (Fig 4, I–L).



View larger version (30K):
In this window
In a new window
Download PPT slide
 
Figure 5. Phylogenic analysis of the CR elements from barley, rice, and maize. Bootstrap values in 1000 tests are indicated on the branches. (A) Phylogenic tree constructed from the gag-pol polyprotein genes. For CRM2b, the polyprotein region in ASM 37378 (Fig 3) was used in the phylogenic analysis. (B) Phylogenic tree constructed from the LTRs. For CRM2b, the 3' LTR in ASM 37376 (Fig 3) was used in the phylogenic analysis.


 
View this table:
In this window
In a new window

 
Table 2. CR elements used in phylogenic analysis

Similar phylogenic results were obtained from the 5' UTR (data not shown) and LTR regions (Fig 5B). Three nonautonomous CR elements were included in the LTR-based phylogenic tree, including the CentA element (ANANIEV et al. 1998 Down), a CRR element in RCB11 (RCB11-1; NONOMURA and KURATA 1999 Down; LANGDON et al. 2000 Down), and the CRR4.4kb element in rice BAC 17p22 (CHENG et al. 2002 Down). These nonautonomous elements made independent clusters from the full-size elements in both rice and maize (Fig 5B).

Four conserved domains were observed in the LTRs of the CR elements from different species (Fig 6). These highly conserved DNA motifs were found in both autonomous and nonautonomous CR elements despite the fact that these elements fall in different clusters in the phylogenic tree (Fig 5B), suggesting that these motifs may be important for the targeting of the CR elements in centromeric regions.



View larger version (54K):
In this window
In a new window
Download PPT slide
 
Figure 6. Conserved motifs in the LTR and PBS of the CR elements in barley, rice, and maize. The 5' LTR and PBS of the retrotransposons were aligned. In the CRM2b and the cereba element, the 3' LTR was used instead of its 5' LTR, because the 5' end of the 5' LTR was truncated. Nucleotide positions of the conserved regions in CRM1a are indicated above the sequences. Stars at the bottom of the sequence indicate conserved base in the sequences. A complement sequence of methionine tRNA is indicated at the bottom of the PBS region.

Phylogenic studies revealed that the full-size CR elements in rice and maize can be grouped into two distinct subfamilies (Fig 5A and Fig B). We analyzed the sequence similarity between the two subfamilies in maize and rice using the MegAlign program in DNASTAR and found that the LTRs and 5' UTRs are significantly more diverged than the pol and gag regions (data not shown). To reveal potential differences in the distribution of these two subfamilies we double labeled DNA probes amplified from the LTR/5' UTR regions. Signals from both subfamilies were mainly located in the centromeric regions of maize metaphase chromosomes. However, the size and intensity of the signals were significantly different in some maize centromeres (Fig 7), suggesting that the elements from the two subfamilies are not uniformly dispersed in these centromeres.



View larger version (9K):
In this window
In a new window
Download PPT slide
 
Figure 7. Chromosomal localization of the PCR products amplified from the CRM1a and CRM2a subfamilies. (A) Signals derived from the PCR products amplified from CRM2a. (B) Signals derived from the PCR products amplified from CRM1a. (C) The FISH signals were merged with the metaphase chromosomes. Chromosomes are stained by DAPI. Note that the PCR products amplified from CRM1a generated minor signals in the knob regions that are stained more intensively by DAPI than were the rest of the chromosomes. Some centromeres (arrows in A and arrowheads in B) show significant differences in the size and intensity of the FISH signals from the two subfamilies. Bar, 5 µm.

Estimation of the age of the retrotransposons in the centromeric BACs:
The two LTRs of a retrotransposon are identical at the time of its insertion into the host genome. If the mutation rate is constant after the transposition, the age of the retrotransposon since transposition can be estimated by the number of substitutions per nucleotide site within the LTRs (SANMIGUEL et al. 1998 Down). An average substitution rate at the adh locus among grasses was estimated at 6.5 x 10(-9) substitutions per synonymous site per year (GAUT et al. 1996 Down). This rate was used to estimate the insertion time of the retrotransposons in this study (Table 3).


 
View this table:
In this window
In a new window

 
Table 3. Estimated age of CR elements and retrotransposons in BACs 16H10 and 15C5

The insertion timing or the ages of the retrotransposons in BAC clone 16H10 are summarized in Fig 8 and Table 3. Sequence analysis suggests that the insert of BAC 16H10 was an intact CentC DNA fragment. This CentC fragment was separated into three CentC tracts due to retrotransposon invasions. All retroelements within 16H10 are younger than 1.3 million years. Four CRM elements inserted directly into the CentC fragment (Fig 8), but the locations within the CentC 156-bp repeat unit of the four insertions are different, indicating that targeting sites of the CRM elements are not sequence specific.



View larger version (29K):
In this window
In a new window
Download PPT slide
 
Figure 8. Timing of transposition of the retrotransposons in BAC 16H10. Each retroelement is marked by a different color. The age of the retrotransposons is estimated on the basis of sequence divergence of the two LTRs. The DNA in the red-shadowed box is not cloned in BAC 16H10. The ages of the retroelements within blue boxes are not known.

The insertion timing of the majority of the retrotransposons in BAC 15C5 was difficult to determine due to the significant sequence degeneracy. Only three retrotransposons retained a pair of complete LTRs. One of these three elements, Cinful2a, is highly rearranged and its structure is difficult to define. The ages of the other two retrotransposons, Cinful2c and Zeon1a, were estimated to be 2.63 and 42.22 million years, respectively (Table 3).

Organization and divergence of the CentC repeat:
Several large shotgun clones covering the CentC tract regions were sequenced using transposon-based sequencing methods to confirm the sequence and the order of the highly similar CentC monomers. The CentC repeats in the two BAC clones were aligned and grouped by the neighbor-joining method. The CentC repeat sequences can be divided into 18 groups (groups A–R; Fig 9). All the CentC repeats from 15C5 are different from those of 16H10, suggesting that the CentC sequences in these two BACs have significantly diverged. Some of the CentC groups periodically appeared in multiple CentC tracts (Fig 9). For example, a JCFFI motif is observed in both A1 and A2 tracts (Fig 9). The physical gap between tract A1 and A2 may contain CentC repeats with identical sequence and organization patterns to those within tracts A1 and A2. Such CentC sequences may be assembled into the "duplicated regions" in tracts A1 and A2. Similarly, HE, QMRPO, or KRLRR motifs are observed periodically in tracts B and C, D, and E and F, respectively (Fig 9). These results indicate that the CentC sequences have been amplified and maintained by higher-order structures of specific CentC monomers.



View larger version (14K):
In this window
In a new window
Download PPT slide
 
Figure 9. Higher-order structure of the CentC repeats in BACs 16H10 and 15C5. Each subgroup of the CentC monomer is indicated by a different letter and then the subgroups are aligned sequentially. The arrows above the sequence indicate the higher-order repeat.

The 3' end of CentC tract A2 and the 5' end of CentC tract B are located in the same position in a CentC monomer, suggesting that these two CentC tracts were separated by the insertion of the CRM2a that transposed ~1.22 million years ago. Interestingly, CentC tracts A2 and B showed completely different patterns (Fig 9), suggesting that retrotransposon invasion may significantly impact the divergence of the centromeric satellite repeats.


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

DNA sequences located within centromeric regions have been isolated in numerous plant species. However, large-scale sequencing and organization studies of centromeric DNA have been documented in only a few plant species. In rice, the central domains of rice centromeres are occupied by a 155-bp satellite repeat CentO (CHENG et al. 2002 Down). Surprising sequence similarity between CentO and the CentC satellite in maize was discovered (CHENG et al. 2002 Down). The CentO satellite arrays are interrupted irregularly by the CRR elements (CHENG et al. 2002 Down) and other retrotransposons (NONOMURA and KURATA 2001 Down). In general, the organization of centromeric DNA in rice, as well as in several other species, including Beta species (GINDULLIS et al. 2001 Down), barley (HUDAKOVA et al. 2001 Down), and Zingeria biebersteiniana (SAUNDERS and HOUBEN 2001 Down), are all similar to that of A. thaliana and contain mainly satellite repeats and retrotransposons.

Previous work by ANANIEV et al. 1998 Down suggested that maize centromeres also contain a centromere-specific satellite repeat (CentC) and the centromere-specific retrotransposon, which we have named CRM. Molecular and cytological data suggest that some maize centromeres contain very limited amounts of CentC and CRM-related sequences (ANANIEV et al. 1998 Down). These results imply that these centromeres may contain additional centromere-specific DNA sequence families. In this study, we sequenced two maize BACs containing the CentC satellite repeat. Sequence analysis revealed that these two BACs exclusively contain satellite repeats and retrotransposons. BAC 16H10 contains retrotransposons both specific and nonspecific to centromeres, while BAC 15C5 contains only retrotransposons that are not specific to centromeres. The results indicate that these two centromeric DNA fragments were derived from the insertion of retroelements into intact CentC arrays (Fig 3). These findings add additional evidence that satellite repeats and retrotransposons are the main DNA components of plant centromeres.

LANGDON et al. 2000 Down demonstrated that all CR elements reported in grass species were derived from a single ancient family. The CR family has a conventional organization and its protein components are highly conserved even in Arabidopsis homologs (LANGDON et al. 2000 Down). Our sequencing results, coupled with the sequenced CRR elements recently deposited in GenBank, provide new data for evolutionary studies of this special retrotransposon family. Phylogenic analysis demonstrated that the nonautonomous CR elements in both maize and rice are significantly diverged from the full-size CR elements (Fig 5B). The full-size CR elements in maize can be divided into two groups on the basis of sequence similarity analysis (Fig 5A and Fig B). The most diverged sequences between the two groups are located within the LTRs and 5' UTR. Cytological analyses suggest that the full-size elements from the two groups are not uniformly intermingled at least in some maize centromeres (Fig 7).

The most striking characteristic of this retrotransposon family is its centromere specificity. All the subfamilies in different species have maintained their exclusive centromere locations. The mechanism of this centromere-specific insertion is unknown. In rice, many of the CRR elements inserted either in the CentO satellite repeat or in other CRR elements (CHENG et al. 2002 Down), suggesting that the satellite repeat or the CRR element itself may create the conditions such as chromatin confirmation (LANGDON et al. 2000 Down) for direct targeting. We found strikingly conserved motifs within the LTRs of the CR elements. Although the grass species were diverged >55 million years ago (KELLOGG 2001 Down), these motifs were found in all the subfamilies (Fig 6). These results suggest that the LTRs may be critical for the centromere-specific transposition.

LANGDON et al. 2000 Down cloned and sequenced PCR products of the CR elements from a number of grass species. The sequence information was expected to provide a basis for estimating the age of individual insertion events, although this would be a substantial underestimate as retrotransposition itself is an error-prone process. A total of 45 reverse transcriptase-encoding clones were obtained from five species and 31 integrase-encoding clones were obtained from eight species. All clones conformed closely to the relevant species consensus, and total variation was in the range of a few percent. The ages of the elements within most species were <1 million years of divergence. LANGDON et al. 2000 Down suggested that the CR family is likely to still be active in most if not all species, while the failure to detect "old" elements implies that either the family is rapidly increasing in abundance at an equivalent rate in each of the divergent species sampled or ancestral sequences are relatively rapidly removed in their entirety before significant levels of degradation occur.

We estimated the age of centromere-specific retrotransposons by comparing the sequences of the two LTRs in individual retrotransposons, an approach more accurate than the method employed by LANGDON et al. 2000 Down. All CRM elements discovered in BAC 16H10 transposed within the last 1.22 million years (Table 3 and Fig 8). We also analyzed six CRR elements recently deposited in GenBank. The oldest CRR element transposed 3.82 million years ago, and the other five were transposed within 1 million years. The cereba elements in barley transposed between 0.31 and 1.19 million years ago. These data suggest that a majority of the CR elements discovered in all these three species transposed recently, consistent with previous conclusions (LANGDON et al. 2000 Down). In contrast, the non-centromere-specific retrotransposons discovered in BAC 15C5 are significantly rearranged, suggesting that these elements transposed much earlier. A Zeon1a element present in 15C5 was estimated to be 42 million years old (Table 3). The young age of the CR elements in different grass species suggests that certain parts of the centromeres, possibly the functional domains, are highly dynamic and evolve rapidly at the DNA sequence level. The recent discovery of the interaction between CRM sequences and centromeric histone H3 in maize (ZHONG et al. 2002 Down) provided the first evidence that the CR elements participate in centromere function and may be a driving force in grass centromere evolution.


*  ACKNOWLEDGMENTS

We thank Evelyn Hiatt for technical assistance. This research was supported by National Science Foundation grant 9975827 to R.K.D. and J.J.

Manuscript received August 5, 2002; Accepted for publication November 22, 2002.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

ALFENITO, M. R. and J. A. BIRCHLER, 1993  Molecular characterization of a maize B chromosome centric sequence. Genetics 135:589-597.[Abstract]

ANANIEV, E. V., R. L. PHILLIPS, and H. W. RINES, 1998  Chromosome-specific molecular organization of maize (Zea mays L.) centromeric regions. Proc. Natl. Acad. Sci. USA 95:13073-13078.[Abstract/Free Full Text]

Analysis of the genome sequence of the flowering plant Arabidopsis thaliana.. (2000) Nature 408:796-815.[Medline]

ARAGON-ALCAIDE, L., T. MILLER, T. SCHWARZACHER, S. READER, and G. MOORE, 1996  A cereal centromeric sequence. Chromosoma 105:261-268.[Medline]

CHENG, Z., F. DONG, T. LANGDON, S. OUYANG, and C. R. BUELL et al., 2002  Functional rice centromeres are marked by a satellite repeat and a centromere-specific retrotransposon. Plant Cell 14:1691-1704.[Abstract/Free Full Text]

CLARKE, L., 1998  Centromeres: proteins, protein complexes, and repeated domains at centromeres of simple eukaryotes. Curr. Opin. Genet. Dev. 8:212-218.[Medline]

COPENHAVER, G. P., K. NICKEL, T. KUROMORI, M. I. BENITO, and S. KAUL et al., 1999  Genetic definition and sequence analysis of Arabidopsis centromeres. Science 286:2468-2474.[Abstract/Free Full Text]

CSINK, A. K. and S. HENIKOFF, 1998  Something from nothing: the evolution and utility of satellite repeats. Trends Genet. 14:200-204.[Medline]

DONG, F., J. T. MILLER, S. A. JACKSON, G. L. WANG, and P. C. RONALD et al., 1998  Rice (Oryza sativa) centromeric regions consist of complex DNA. Proc. Natl. Acad. Sci. USA 95:8135-8140.[Abstract/Free Full Text]

FRANCKI, M. G., 2001  Identification of Bilby, a diverged centromeric Ty1-copia retrotransposon family from cereal rye (Secale cereale L.). Genome 44:266-274.[Medline]

FRANSZ, P. F., A. ARMSTRONG, J. H. DE JONG, L. D. PARNELL, and C. VAN DRUNEN et al., 2000  Integrated cytogenetic map of chromosome arm 4S of A. thaliana: structural organization of heterochromatic knob and centromere region. Cell 100:367-376.[Medline]

GAUT, B. S., B. R. MORTON, B. C. MCCAIG, and M. T. CLEGG, 1996  Substitution rate comparisons between grasses and palms: synonymous rate differences at the nuclear gene Adh parallel rate differences at the plastid gene rbcL. Proc. Natl. Acad. Sci. USA 93:10274-10279.[Abstract/Free Full Text]

GINDULLIS, F., C. DESEL, I. GALASSO, and T. SCHMIDT, 2001  The large-scale organization of the centromeric region in Beta species. Genome Res. 11:253-265.[Abstract/Free Full Text]

HARRINGTON, J. J., G. V. BOKKELEN, R. W. MAYS, K. GUSTASHAW, and H. F. WILLARD, 1997  Formation of de novo centromeres and construction of first-generation human artificial microchromosomes. Nat. Genet. 15:345-355.[Medline]

HARRISON, G. E. and J. S. HESLOP-HARRISON, 1995  Centromeric repetitive DNA in the genus Brassica.. Theor. Appl. Genet. 90:157-165.

HENNING, K. A., E. A. NOVOTNY, S. T. COMPTON, X.-Y. GUAN, and P. P. LIU et al., 1999  Human artificial chromosomes generated by modification of a yeast artificial chromosome containing both human alpha satellite and single-copy DNA sequences. Proc. Natl. Acad. Sci. USA 96:592-597.[Abstract/Free Full Text]

HUDAKOVA, S., W. MICHALEK, G. G. PRESTING, R. TEN HOOPEN, and K. DOS SANTOS et al., 2001  Sequence organization of barley centromeres. Nucleic Acids Res. 29:5029-5035.[Abstract/Free Full Text]

IKENO, M., B. GRIMES, T. OKAZAKI, M. NAKANO, and K. SAITOH et al., 1998  Construction of YAC-based mammalian artificial chromosomes. Nat. Biotech. 16:431-439.[Medline]

JACKSON, S. A., M. L. WANG, H. M. GOODMAN, and J. JIANG, 1998  Application of fiber-FISH in genome analysis of Arabidopsis thaliana.. Genome 41:566-572.[Medline]

JACKSON, S. A., F. DONG, and J. JIANG, 1999  Digital mapping of bacterial artificial chromosomes by fluorescence in situ hybridization. Plant J. 17:581-587.[Medline]

JIANG, J., B. S. GILL, G. L. WANG, P. C. RONALD, and D. C. WARD, 1995  Metaphase and interphase fluorescence in situ hybridization mapping of the rice genome with bacterial artificial chromosomes. Proc. Natl. Acad. Sci. USA 92:4487-4491.[Abstract/Free Full Text]

JIANG, J., S. NASUDA, F. DONG, C. W. SCHERRER, and S. WOO et al., 1996  A conserved repetitive DNA element located in the centromeres of cereal chromosomes. Proc. Natl. Acad. Sci. USA 93:14210-14213.[Abstract/Free Full Text]

KASZAS, E. and J. A. BIRCHLER, 1996  Misdivision analysis of centromere structure in maize. EMBO J. 15:5246-5255.[Medline]

KASZAS, E. and J. A. BIRCHLER, 1998  Meiotic transmission rates correlate with physical features of rearranged centromeres in maize. Genetics 150:1683-1692.[Abstract/Free Full Text]

KELLOGG, E. A., 2001  Evolutionary history of the grasses. Plant Physiol. 125:1198-1205.[Free Full Text]

KIMURA, M., 1980  A simple method for estimating evolutionary rates of base substitution through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111-120.[Medline]

KISHII, M., K. NAGAKI, and H. TSUJIMOTO, 2001  A tandem repetitive sequence located in the centromeric region of common wheat (Triticum aestivum) chromosomes. Chromosome Res. 9:417-428.[Medline]

KUMEKAWA, N., T. HOSOUCHI, H. TSURUOKA, and H. KOTANI, 2000  The size and sequence organization of the centromeric region of Arabidopsis thaliana chromosome 5. DNA Res. 7:315-321.[Abstract]

KUMEKAWA, N., T. HOSOUCHI, H. TSURUOKA, and H. KOTANI, 2001  The size and sequence organization of the centromeric region of Arabidopsis thaliana chromosome 4. DNA Res. 8:285-290.[Abstract]

LANGDON, T., C. SEAGO, M. MENDE, M. LEGGETT, and H. THOMAS et al., 2000  Retrotransposon evolution in diverse plant genomes. Genetics 156:313-325.[Abstract/Free Full Text]

MALUSZYNSKA, J. and J. S. HESLOP-HARRISON, 1991  Localization of tandemly repeated DNA sequences in Arabidopsis thaliana.. Plant J. 1:159-166.

MARTINEZ-ZAPATER, J. M., M. A. ESTELLE, and C. R. SOMERVILLE, 1986  A high repeated DNA sequence in Arabidopsis thaliana.. Mol. Gen. Genet. 204:417-423.

MILLER, J. T., F. DONG, S. A. JACKSON, J. SONG, and J. JIANG, 1998a  Retrotransposon-related DNA sequences in the centromeres of grass chromosomes. Genetics 150:1615-1623.[Abstract/Free Full Text]

MILLER, J. T., S. A. JACKSON, S. NASUDA, B. S. GILL, and R. A. WING et al., 1998b  Cloning and characterization of a centromere-specific repetitive DNA element from Sorghum bicolor.. Theor. Appl. Genet. 96:832-839.

NAGAKI, K., H. TSUJIMOTO, and T. SASAKUMA, 1998  A novel repetitive sequence of sugar cane, SCEN family, locating on centromeric regions. Chromosome Res. 6:295-302.[Medline]

NIZETIC, D., R. DRMANAC, and H. LEHRACH, 1990  An improved bacterial colony lysis procedure enables direct DNA hybridization using short (10, 11 bases) oligonucleotides to cosmids. Nucleic Acids Res. 19:182.

NONOMURA, K. I. and N. KURATA, 1999  Organization of the 1.9-kb repeat unit RCE1 in the centromeric region of rice chromosomes. Mol. Gen. Genet. 261:1-10.[Medline]

NONOMURA, K. I. and N. KURATA, 2001  The centromere composition of multiple repetitive sequences on rice chromosome 5. Chromosoma 110:284-291.[Medline]

PAGE, B. T., M. K. WANOUS, and J. A. BIRCHLER, 2001  Characterization of a maize chromosome 4 centromeric sequence: evidence for an evolutionary relationship with the B chromosome centromere. Genetics 159:291-302.[Abstract/Free Full Text]

PRESTING, G. G., L. MALYSHEVA, J. FUCHS, and I. SCHUBERT, 1998  A Ty3/gypsy retrotransposon-like sequence localizes to the centromeric regions of cereal chromosomes. Plant J. 16:721-728.[Medline]

ROUND, E. K., S. K. FLOWERS, and E. J. RICHARDS, 1997  Arabidopsis thaliana centromere regions: genetic map positions and repetitive DNA structure. Genome Res. 7:1045-1053.[Abstract/Free Full Text]

SAITOU, N. and M. NEI, 1987  The neighbor-joining method: a new method for reconstructing phylogenic trees. Mol. Biol. Evol. 4:406-425.[Abstract]

SANMIGUEL, P., B. S. GAUT, A. TIKHONOV, Y. NAKAJIMA, and J. L. BENNETZEN, 1998  The paleontology of intergene retrotransposons of maize. Nat. Genet. 20:43-45.[Medline]

SAUNDERS, V. A. and A. HOUBEN, 2001  The pericentromeric heterochromatin of the grass Zingeria biebersteiniana (2n=4) is composed of Zbcen1-type tandem repeats that are intermingled with accumulated dispersedly organized sequences. Genome 44:955-961.[Medline]

SHIZUYA, H., B. BIRREN, U. J. KIM, V. MANCINO, and T. SLEPAK et al., 1992  Cloning and stable maintenance of 300-kilobase-pair fragments of human DNA in Escherichia coli using an F-factor-based vector. Proc. Natl. Acad. Sci. USA 89:8794-8797.[Abstract/Free Full Text]

SONG, J., F. DONG, and J. JIANG, 2000  Construction of a bacterial artificial chromosome (BAC) library for potato molecular cytogenetics research. Genome 43:199-204.[Medline]

SUTTON, G. G., O. WHITE, M. D. ADAMS, and A. R. KERLAVAGE, 1995  TIGR assembler: a new tool for assembling large shotgun sequencing projects. Genome Res. 1:9-19.

TALBERT, P. B., R. MASUELLI, A. P. TYAGI, L. COMAI, and S. HENIKOFF, 2002  Centromeric localization and adaptive evolution of an Arabidopsis histone H3 variant. Plant Cell 14:1053-1066.[Abstract/Free Full Text]

THOMPSON, J. D., T. J. GIBSON, F. PLEWNIAK, F. JEANMOUGIN, and D. G. HIGGINS, 1997  The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25:4876-4882.[Abstract/Free Full Text]

YANG, Z., 1997  PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13:555-556.[Free Full Text]

YUAN, Q., J. HILL, K. MOFFAT, J. HSIAO, and Z. CHENG et al., 2002  Genome sequencing of a 239-kb region of rice chromosome 10L reveals a high frequency of gene duplication and a large chloroplast DNA insertion. Mol. Genet. Genomics 267:713-720.[Medline]

ZHONG, C. X., J. B. MARSHALL, C. TOPP, R. MROCZEK, and A. KATO et al., 2002  Centromeric retroelements and satellites interact with maize kinetochore protein CENH3. Plant Cell 14:2825-2836.[Abstract/Free Full Text]




This article has been cited by other articles:


Home page
GeneticsHome page
P. Neumann, H. Yan, and J. Jiang
The Centromeric Retrotransposons of Rice Are Transcribed and Differentially Processed by RNA Interference
Genetics, June 1, 2007; 176(2): 749 - 761.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
T.-H. Park, J.-B. Kim, R. C. B. Hutten, H. J. van Eck, E. Jacobsen, and R. G. F. Visser
Genetic Positioning of Centromeres Using Half-Tetrad Analysis in a 4x-2x Cross Population of Potato
Genetics, May 1, 2007; 176(1): 85 - 94.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
A. C. Luce, A. Sharma, O. S. B. Mollere, T. K. Wolfgruber, K. Nagaki, J. Jiang, G. G. Presting, and R. K. Dawe
Precise Centromere Mapping Using a Combination of Repeat Junction Markers and Chromatin Immunoprecipitation-Polymerase Chain Reaction
Genetics, October 1, 2006; 174(2): 1057 - 1061.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. C. Lamb and J. A. Birchler
Retroelement Genome Painting: Cytological Visualization of Retroelement Expansions in the Genera Zea and Tripsacum
Genetics, June 1, 2006; 173(2): 1007 - 1021.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
E. R. Havecker, X. Gao, and D. F. Voytas
The Sireviruses, a Plant-Specific Lineage of the Ty1/copia Retrotransposons, Interact with a Family of Proteins Related to Dynein Light Chain 8
Plant Physiology, October 1, 2005; 139(2): 857 - 868.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
S. Nasuda, S. Hudakova, I. Schubert, A. Houben, and T. R. Endo
Stable barley chromosomes without centromeric repeats
PNAS, July 12, 2005; 102(28): 9842 - 9847.
[Abstract] [Full Text] [PDF]


Home page
Plant CellHome page
W. Jin, J. C. Lamb, J. M. Vega, R. K. Dawe, J. A. Birchler, and J. Jiang
Molecular and Functional Dissection of the Maize B Chromosome Centromere
PLANT CELL, May 1, 2005; 17(5): 1412 - 1423.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
K. Nagaki, P. Neumann, D. Zhang, S. Ouyang, C. R. Buell, Z. Cheng, and J. Jiang
Structure, Divergence, and Distribution of the CRR Centromeric Retrotransposon Family in Rice
Mol. Biol. Evol., April 1,