| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Genetics, Vol. 176, 749-761, June 2007, Copyright © 2007
doi:10.1534/genetics.107.071902
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Department of Horticulture, University of Wisconsin, Madison, Wisconsin 53706
1 Corresponding author: Department of Horticulture, University of Wisconsin, 1575 Linden Dr., Madison, WI 53706.
E-mail: jjiang1{at}wisc.edu
| ABSTRACT |
|---|
|
|
|---|
Phylogenetic analysis showed that the CRR family has diverged into two subfamilies of autonomous elements (CRR1 and CRR2) and two subfamilies of nonautonomous elements (noaCRR1 and noaCRR2) (NAGAKI et al. 2005). Close CRR relatives identified in several distantly related grass species (MILLER et al. 1998), including maize (centromeric retrotransposon of maize, CRM) (ZHONG et al. 2002; NAGAKI et al. 2003), barley (Cereba elements) (PRESTING et al. 1998; HUDAKOVA et al. 2001), and sugarcane (NAGAKI and MURATA 2005), are almost exclusively located in centromeres. These results suggest that this family of retrotransposons colonized in the centromeres of the common ancestor of these grass species and has maintained its centromeric specificity for >50 MY (KELLOGG 2001). The comparison of centromeric retrotransposons from rice, maize, and barley revealed several highly conserved motifs even within their long terminal repeats (LTRs), suggesting that the sequences might have evolved under selective pressure at the DNA level (NAGAKI et al. 2003).
Repetitive DNA elements located in the centromeric heterochromatin are often transcriptionally silent. However, a low level of transcription paradoxically is required for establishing the transcriptionally silent state of heterochromatin through RNA interference (RNAi) (reviewed in BERNSTEIN and ALLIS 2005; MATZKE and BIRCHLER 2005). In the RNAi pathway, double-stranded RNA (dsRNA) derived from repetitive DNA sequences is processed by the RNAi machinery into short interference RNA (siRNA). The siRNAs then participate in RNAi by targeting other RNA molecules for degradation and also by driving heterochromatin formation at complementary genomic sites (GREWAL and RICE 2004; VERDEL et al. 2004; CHAN et al. 2005; GENDREL and COLOT 2005). Thus, the importance of this pathway not only lies in silencing of repetitive DNA elements, which are frequent targets of RNAi, but also seems to be crucial for heterochromatin formation and centromere function, which has been documented in several species, including Schizosacharomyces pombe (VOLPE et al. 2003; CAM et al. 2005; PIDOUX and ALLSHIRE 2005), humans (FUKAGAWA et al. 2004), mice (KANELLOPOULOU et al. 2005), Drosophila (PAL-BHADRA et al. 2004; DESHPANDE et al. 2005), and Trypanosoma brucei (DURAND-DUBIEF and BASTIN 2003).
The high abundance and centromere-specific localization of the CRR elements in rice provide an ideal system for studying the transcription and its association with heterochromatin formation and centromere function of the centromeric repeats. Here we report a comprehensive transcription study of the CRR elements using a combination of bioinformatics and experimental approaches. We demonstrate that CRR transcripts are present in all tested rice organs, suggesting that the transcription of CRR sequences is constitutive. Transcription initiation as well as termination occur both inside and outside of CRR elements. Some of the CRR transcripts are processed into small RNA (smRNA). The potential impact of CRR transcription on centromeric heterochromatin formation and centromere function is discussed.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Cloning, sequencing, and sequence analyses:
Both RTPCR and RACE products were cloned into pCRII TOPO vector using the TOPO TA cloning kit (Invitrogen). The sequencing of selected clones was done by the dideoxy-mediated chain-termination method using the BigDye terminator v3.1 cycle sequencing kit (Applied Biosystems, Foster City, CA). Sequences of RTPCR and RACE products were assembled using Staden Package software (STADEN 1996) and all of them were submitted into the GenBank database (supplemental Table 2 at http://www.genetics.org/supplemental/).
GenBank EST (BOGUSKI et al. 1993) and KOME full-length cDNA (fl-cDNA) (KIKUCHI et al. 2003) were searched for CRR-similar sequences using the Blast program (ALTSCHUL et al. 1997). The sequences were analyzed using programs implemented at the Biology WorkBench website (http://workbench.sdsc.edu/), EMBOSS (RICE et al. 2000), and Staden Package software (STADEN 1996). Preliminary sequence comparisons were done using the Dotter and J-dotter programs (SONNHAMMER and DURBIN 1995; BRODIE et al. 2004). Multiple alignments were made using CLUSTALW (THOMPSON et al. 1994). RPS-Blast (MARCHLER-BAUER et al. 2003) and InterProScan (ZDOBNOV and APWEILER 2001) were used to search conserved domain and InterPro protein databases, respectively. Splice site analysis was performed using the FSPLICE program implemented at the SoftBerry server (http://www.softberry.com/berry.phtml). Databases of CRR elements, RTPCR products, fl-cDNAs, ESTs, and rice genome pseudomolecules were screened and compared with one another using Standalone Blast. The database of CRR elements contained all sequences published by NAGAKI et al. (2005). Sequences of the rice genome pseudomolecules (build 4.0 release) were downloaded from the International Rice Genome Sequencing Project website (http://rgp.dna.affrc.go.jp/IRGSP/Build4/build4.html). Sequences of CRR elements that we considered typical for each CRR subfamily and used for database searches and basic sequence comparisons were downloaded from the GenBank database under the following accession nos.: AY827957 (CRR1_CH1-2), AY828150 (CRR1_CH10-1), AY828151 (CRR1_CH10-2), AY827959 (CRR2_CH1-1), AY828019 (CRR2_CH4-2), AY828024 (CRR2_CH4-7), AY828152 (CRR2_CH10-1), AY828039 (noaCRR1_CH4-5), and AY828004 (noaCRR2_CH1-3). Two databases of smRNA sequences were searched to identify CRR-derived smRNA sequences (SUNKAR et al. 2005b; JOHNSON et al. 2007). Sequences of all CRR elements described by NAGAKI et al. (2005) and sequences of RTPCR and RACE products were used as queries in the smRNA database search.
Northern blot hybridization:
Total RNA was isolated from leaves and treated with DNaseI (Ambion, Austin, TX). Approximately 40 µg RNA was loaded in each lane and resolved on denaturing 1% agarose gel and then transferred on Nytran SPC nylon membrane (Schleicher & Schuell BioScience, Keene, NH). Probes were radioactively labeled with 3000 Ci/mmol [
-32P]dATP (Amersham Biosciences, Piscataway, NJ) using the Strip-EZ DNA kit (Ambion). PCR-amplified inserts from clones ID51 (CRR1 LTR; DV468825), ID69 (CRR2 LTR; DV468838), ID13 (CRR1 reverse transcriptase coding domain, rt; DV468787), ID18 (CRR2 rt; DV468782), ID101 (noaCRR1 LTR; DV468869), and ID45 (noaCRR1 gag; DV468819) were used as templates for the labeling. The hybridization was performed overnight in 125 mM sodium phosphate buffer (pH 7.2) containing 50% deionized formamide, 7% SDS, and 250 mM sodium chloride at 50°. After the hybridization, membranes were washed twice in 2x SSC and 0.1% SDS for 15 min, twice in 0.2x SSC and 0.1% SDS for 15 min, and, finally, once in 5x SSC and 0.5% SDS for 10 min at 65°. Signals were detected using a phosphorimager.
Detection of smRNA:
Low-molecular-weight RNA was isolated using the mirVana microRNA (miRNA) isolation kit (Ambion). Ten micrograms of RNA was resolved on denaturing 15% polyacrylamide gel and then transferred to Nytran SPC nylon membrane (Schleicher & Schuell BioScience). Probes were labeled with 800 Ci/mmol [
-32P]UTP using the MAXIscript kit for in vitro transcription labeling (Ambion). The templates for in vitro transcription were prepared from RACE clones ID51 (CRR1 LTR; DV468825), ID69 (CRR2 LTR; DV468838), and ID101 (noaCRR1 LTR; DV468869). Promoter sequences for T7 polymerase were added to the CRR sequences by PCR using primer pairs T7+CRR1_1F (5'-TAA TAC GAC TCA CTA TAG GGT GAT GAG GAC ATC CCT TCC-3') and AUAP_3'-RACE, T7+CRR2_1F (5'-TAA TAC GAC TCA CTA TAG GGT GAT GAG GAC ATC AAC ACC A-3') and AUAP_3'-RACE or T7+RACER_PN2 (5'-TAA TAC GAC TCA CTA TAG GGC ACT GAC ATG GAC TGA AGG A-3') and noaCRR1_2R. To visualize marker RNA, 0.5 fmol of the marker-specific template was added to the labeling reactions. The hybridization was performed overnight in 125 mM sodium phosphate buffer (pH 7.2) containing 50% deionized formamide, 7% SDS, and 250 mM sodium chloride at 42°. After the hybridization, membranes were washed three times in 2x SSC and 0.1% SDS for 10 min, twice in 1x SSC and 0.1% SDS for 15 min, and, finally, once in 5x SSC and 0.5% SDS for 10 min at 50°. DNA oligonucleotide probes P98-A8 (5'-ACA CGG ACA TCT ACA GAA AAA-3') and OsmiR156 (5'-TGT GCT CAC TCT CTT CTG TCA-3') were labeled at 5'-ends with [
-32P]ATP (Amersham Biosciences) using T4 polynucleotide kinase (Invitrogen) according to manufacturer's protocol. For these two probes, temperatures of hybridization and washing were changed to 38° and 42°, respectively. Signals were detected using a phosphorimager.
| RESULTS |
|---|
|
|
|---|
|
|
|
The fl-cDNA sequences were used to search the genome sequence of Oryza sativa cv. Nipponbare to find corresponding genomic loci. The identified genomic loci shared 99.7100% sequence similarity with the fl-cDNAs. Thirteen genomic sequences were interrupted by one to nine introns, which were identified as regions missing in the fl-cDNA sequences and bordered by the canonical GT-AG dinucleotides. The intron spliced out from sequence AK102311 was extremely long (12,999 bp) and was composed largely of a cluster of an unknown tandem repeat (locus I in Figure 2). Comparison of multiple fl-cDNAs derived from the same genomic loci revealed alternative splicing and variability in the position of poly(A) tails (loci A, B, E, and O in Figure 2). The genomic loci were also analyzed with respect to the CRR sequences. Full-length elements were identified in eight loci. The remaining genomic loci contained solo LTR or truncated CRR sequences. The similarity between two LTRs of the full-length elements varied between 90.33% and 99.91%. Considering that 5'- and 3'-LTRs are identical upon insertion and that the rate of nucleotide substitution is 6.5 x 109 (GAUT et al. 1996), the age of these CRR elements was estimated to be from 0.1 to 15 MY.
The search in the EST database in the GenBank revealed 55 accessions (of 406,790 ESTs) with sequence similarity to CRRs. Most of them (38) were similar to the noaCRR1 subfamily. Six sequences shared similarity with CRR1, six with CRR2, and five with noaCRR2 subfamilies. About half of the ESTs had a chimerical structure containing CRR-unrelated sequences. CRR-related regions mostly corresponded to the LTR or the 5'-UTR regions (supplemental Table 4 at http://www.genetics.org/supplemental/). The internal coding region was present in only eight ESTs and it was exclusively represented by a short sequence upstream of the 3'-LTR. Only six EST sequences (11%) originated from plants grown under normal physiological conditions, whereas most of them (78%) were from stressed or transgenic plants or from callus tissue (supplemental Table 4 at http://www.genetics.org/supplemental/). No EST sequence was identical to any of the KOME fl-cDNAs.
Experimental confirmation of CRR transcription:
RTPCR was used to assess the transcriptional patterns of the CRR elements in different rice organs (root, leaf, and panicles) and to exploit transcripts associated with the internal regions of CRRs, which were not found in the fl-cDNA/EST databases. RTPCR was performed using primers specific for three CRR subfamilies, including CRR1, CRR2, and noaCRR1 (supplemental Table 1 at http://www.genetics.org/supplemental/). These primers covered the entire internal region of individual CRR elements in 0.5- to 1-kb intervals. Since the similarity in the pol region is very high between CRR1 and CRR2, we used the same primer pairs to cover this region for both subfamilies.
RTPCR products of expected sizes were obtained in all reactions and showed the same pattern in three different organs (Figure 3). However, there were differences in yield in many cases, mostly showing a higher abundance of CRR transcripts in panicles than in roots and leaves (Figure 3, A and B). As the actin control showed a similar yield in all three organs, those differences were not due to unequal RNA loading. One considerably shorter fragment was amplified in addition to the fragment of expected size using primers CRR1/2_2F and CRR1/2_3R (Figure 3A). Negative controls, which were included for each primer combination and each RNA sample, yielded no products. RTPCR products amplified from leaf and in few cases from root and panicle were cloned and sequenced. All of the obtained sequences (64 total; supplemental Table 2 at http://www.genetics.org/supplemental/) shared high similarity with CRR elements. Most sequences derived from the coding region contained one intact open reading frame (ORF), which could be directly translated into protein. The putative protein sequences contained motifs of active sites, which suggests that they could be functional (data not shown).
|
40 µg of total RNA were hybridized with probes specific to the LTR region of CRR1, CRR2, and noaCRR1, the RT-coding region of CRR1 and CRR2, and the GAG-coding region of noaCRR1. Although all probes were prepared from RTPCR or RACE clones (see below), strong and unambiguous signals were detected only by the CRR2 LTR probe and the two noaCRR1 probes (Figure 4A). These three probes hybridized to an
3.1-kb long transcript. The CRR2 LTR probe hybridized to this transcript strongly. However, both noaCRR1 probes hybridized weakly (Figure 4A). We assume that the transcript likely originated from a CRR2 element and that the weak hybridization to the noaCRR1 probes was due to partial similarity between CRR2 and noaCRR1 sequences. Its size, which is considerably shorter than would be expected for a hypothetical full-length CRR2 transcript (78 kb), suggests that the transcript was prematurely terminated, spliced, or originated from a truncated element. In addition to this transcript, the three probes also hybridized as a 4- to 15-kb smear having the highest intensity between 7 and 10 kb. The Northern hybridization results revealed a low level of transcription, suggesting that most of the CRR elements are transcriptionally silent.
|
|
2324 nt in length (Figure 4C). Faint smRNA signals were also detected with the CRR1 probe. However, the CRR2 probes did not hybridize with smRNAs. The CRR2 LTR probe detected an
3.1-kb CRR transcript on the total RNA blots (Figure 4A), which implies that the majority, if not all, of CRR2 transcripts escaped processing by the RNAi machinery in rice leaves. To check RNA quality, the blots were reprobed with a probe specific for previously described miRNA OsmiR156 (SUNKAR et al. 2005a). Both sizes of this miRNA (20 and 21 nt) were consistently detected, showing that the small RNA on all blots was intact (Figure 4C). In addition to the experimental detection of CRR-derived smRNA, we also searched for CRR-related smRNAs in rice databases (SUNKAR et al. 2005b; JOHNSON et al. 2007). The search within the databases of SUNKAR et al. (2005b) and JOHNSON et al. (2007) revealed 1 and 25 sequences, respectively, with high similarity to CRR elements (supplemental Table 5 at http://www.genetics.org/supplemental/). Of these smRNAs, 1 was derived from CRR1, 4 from CRR2, 17 from noaCRR1, and 4 from noaCRR2. The majority (88%) of smRNA sequences originated from the LTR and 5'-UTR regions (supplemental Table 5 at http://www.genetics.org/supplemental/). This is in agreement with the fact that these two regions were most represented among the fl-cDNA and EST sequences described above. Some smRNA sequences showed a perfect match to the fl-cDNA and EST sequences but it was not possible to determine whether they originated from these sequences because they have multiple identical matches to different genomic loci. The only exception was smRNA15884, which matches only one locus in the entire rice genome. All identified CRR-derived smRNAs occurred only one to two times among 35,454 in the genome-mapped smRNA collection (supplemental Table 5 at http://www.genetics.org/supplemental/ and JOHNSON et al. 2007). Importantly, five smRNA sequences shared significance with the noaCRR1 and 1 with the CRR1 probes that hybridized to the smRNA in Northern hybridization (Figure 4). In contrast, no smRNA sequence was similar to the CRR2 probe, which did not hybridize to the smRNA. Thus, our experimental data are in agreement with the result of the computational analysis of CRR-derived smRNA sequences.
Splicing of CRR sequences:
Two spliced regions were found in the CRR1 sequences derived from RTPCR and 3'-RACE analyses. The first one was identified as a missing region in the shorter fragment amplified by primers CRR1/2_2F and CRR1/2_3R (Figure 3A). The missing region proved to be an intron because it was bordered by canonical 5'GT and AG 3' dinucleotides. This was also confirmed by absence of the spliced-size product when the genomic DNA was used as a template for PCR (data not shown). The splicing of this region resulted in a complete removal of the RT-coding domain predicted by the RPS-Blast program. The position of the acceptor site was the same in all analyzed sequences. However, there were three different donor sites (DS13) located very close to one another and alternatively used for the splicing (Figure 6). As the donor sites were placed in three different reading frames, the impact of the splicing on the translation of the downstream coding domain (RNaseH, integrase, chromodomain) differed. The splicing at the site DS2 removed only the RT-coding domain but still allowed the translation of the downstream domains, whereas the splicing at sites DS1 and DS3 separated upstream and downstream coding regions into different reading frames and most likely suppressed translation of the downstream domains. Although all spliced sequences had higher similarity to CRR1 than to the CRR2 subfamily, comparison of this region showed that donor and acceptor sites are conserved in all CRR subfamilies and also in the related CRM (maize) and Cereba (barley) elements (Figure 6).
|
|
| DISCUSSION |
|---|
|
|
|---|
CRR transcripts were detected by RTPCR in all three organs tested (root, leaf, panicle), suggesting that the CRR-related sequences are transcribed constitutively. Although transcription can be initiated from promoters located within 5'-LTRs (supplemental Figure 1 at http://www.genetics.org/supplemental/), analysis of the fl-cDNA sequences showed that most CRR transcripts are probably initiated from promoters upstream of the CRR elements (Figure 2). The level of CRR transcription appeared to be low. The frequencies of CRR transcripts in the fl-cDNA and EST databases are 0.089% and 0.013%, respectively. Similar EST frequencies were reported for both Ty3/gypsy and Ty1/copia retrotransposons present in other Graminae species (VICIENT et al. 2001a; ECHENIQUE et al. 2002), suggesting that the CRR family does not differ significantly in transcriptional level from other retrotransposon families. The low level of transcription of the CRR sequences, except for a few specific loci, was confirmed by Northern blot hybridization (Figure 4A). We also found that many CRR transcripts were possibly derived from relatively few genomic elements. Several fl-cDNAs were derived from the same genomic loci (Figure 2). A total of 24 RTPCR and RACE sequences had the highest similarity to the CRR2 element associated with fl-cDNA AY828096 (locus N in Figure 2). Six RTPCR sequences were most similar to three different regions of one noaCRR1 element (DQ458290). All six CRR2 3'-RACE sequences seemed to originate from the same genomic locus (locus E in Figure 2). These results suggest that only a few CRR elements escape silencing and contribute to the transcription.
We mapped the chromosomal locations of the genomic loci corresponding to the fl-cDNAs (Table 1). Two loci were mapped to regions of chromosomes 3 and 10 that contain the centromere-specific satellite CentO arrays. Chromosomes 3 and 10 contain <500 kb of the CentO repeats (CHENG et al. 2002; YAN et al. 2006). These two loci are likely located within the CENH3-binding domains that span
750 kb in chromosome 8 and
1800 kb in chromosome 3 (NAGAKI et al. 2004; YAN et al. 2006). Nine other loci were located within 5 Mbp from the CentO arrays. The remaining eight loci were more distant (up to
21 Mbp) from the centromeres. These results show that the CRR elements located in the centromeric regions are not more frequently transcribed than those in the pericentromeric regions.
Possible function of the CRR transcripts:
We analyzed the potential coding capacity of the CRR-related fl-cDNAs. The longest ORF was considered a potential coding sequence. The length of such coding sequences varied from 99 to 2046 nt (Figure 2). Comparison of the positions between the longest ORFs and CRR-related regions showed that they were overlapping in 13 (52%) fl-cDNAs, which in some cases was reflected by similarity to CRR putative polyprotein sequences (data not shown). The remaining fl-cDNAs contained a CRR-related region exclusively upstream (2 fl-cDNAs) or downstream (10 fl-cDNAs) of the longest ORF. Proteins translated from these fl-cDNAs were used as queries for searches made by RPS-Blast (MARCHLER-BAUER et al. 2003) and InterProScan (ZDOBNOV and APWEILER 2001), but no significant similarity to any proteins with known function was found.
The role of partial and chimerical CRR transcripts can only be speculated. Some of them may encode for proteins of unknown function, while others may represent noncoding RNAs with regulatory functions (MOREY and AVNER 2004; COSTA 2005). Several recent reports demonstrated the function of noncoding RNA on heterochromatin structure and centromere function. MAISON et al. (2002) were the first to demonstrate that the higher-order structure of the centromeric heterochromatin in mice depends on the presence of an RNA component. In maize, transcripts ranging in size from 40 to 900 nt derived from CRM elements and centromeric satellite repeat CentC are tightly bound to CENH3, suggesting a potential role of such noncoding RNA in the specification of centromeric chromatin (TOPP et al. 2004). Most recently, BOUZINBA-SEGARD (2006) showed that centromeric RNAs transcribed from the centromeric minor satellite repeat of mice are localized to centromeres. Forced accumulation of the 120-nt transcripts leads to defects in chromosome segregation and sister-chromatid cohesion (BOUZINBA-SEGARD et al. 2006). The authors proposed that the centromeric RNAs may play a role in regulation of heterochromatin assembly by tethering of kinetochore- and heterochromatin-associated proteins.
Our PCR-based experiment could not prove the hypothesis that CRR promoters may play a significant role in transcription of the downstream sequences, such as CentO repeats, which are highly intermingled with CRR elements in rice centromeres (CHENG et al. 2002). Cotranscripts of CRR and CentO repeats neither were found in the sequence databases nor were detected using RTPCR (data not shown). In addition, we did not detect transcripts by RTPCR using primers designed from downstream sequences of specific CRR elements in 12 loci within the centromere of chromosome 8 (data not shown). Thus, although some CRR insertions can provide active promoters, as we showed by 5'-RACE experiments, their impact on transcription of the downstream sequences may be limited. This is in agreement with the result of MAY et al. (2005) who detected cotranscripts of the Arabidopsis thaliana centromeric satellite repeat cen180 and retrotransposon Athila (106B) in the met1 mutant but not in the wild-type plant.
We propose that the long transcripts and smRNA derived from CRR elements may contribute to the formation of centromeric heterochromatin in rice. One possible explanation for the low level of CRR transcription is that most CRR transcripts may be turned over quickly by the RNAi pathway or integrated into centromeric chromatin. Several recent studies reported that a significant number of transcripts derived from centromeric repeats accumulated in mutants of components of the RNAi machinery (VOLPE et al. 2002; FUKAGAWA et al. 2004; MURCHISON et al. 2005). Our data showed that there are considerable differences in abundance and origin of the smRNAs derived from individual CRR subfamilies. While the noaCRR1 transcripts seem to be extensively processed into smRNA, CRR1- and CRR2-derived smRNAs are much less abundant (Figure 4C; supplemental Table 5 at http://www.genetics.org/supplemental/). Strikingly, all identified small RNA sequences derived from CRR1, noaCRR1, and noaCRR2 elements originated from the LTR and 5'-UTR regions. In contrast, all CRR2 smRNAs originated from the inside part, including 5'-UTR and gag-pol coding regions. As we showed that all CRR regions are transcribed (Figures 3 and 5), these findings suggest a differential extent of RNAi processing of the CRR transcripts. CRR1 and CRR2 elements are almost exclusively located in the centromeric regions. However, noaCRR1 elements are distributed more widely in the rice genome and were found in both centromeric and noncentromeric regions (NAGAKI et al. 2005). The observed differences may also be due to differences in copy numbers of individual CRR elements in the rice genome and to differences in abundance of the corresponding transcripts. The absence of CRR2 LTR-derived smRNA correlates with the high abundance of long transcripts hybridizing with the CRR2 LTR probe (Figure 4), which suggests that some CRR2 transcripts escape from the RNAi processing. These results suggest that the different CRR subfamilies, and perhaps also different parts of their sequences, play different roles in the RNAi-mediated pathway for formation and maintenance of centromeric heterochromatin.
Alternative splicing of CRR elements:
Although retroviruses are spliced during post-transcriptional processing (RABSON and GRAVES 1997), splicing of LTR retrotransposons is rare and has been reported in only a few retrotransposon families (BRIERLEY and FLAVELL 1990; VICIENT et al. 2001b; NEUMANN et al. 2003). Retrotransposon splicing was proposed to play roles in regulation of the GAG/GAG-PRO-POL or GAG-PRO/GAG-PRO-POL ratios (BRIERLEY and FLAVELL 1990; NEUMANN et al. 2003) and in expression of the envelope-like gene (VICIENT et al. 2001b). The splicing of CRR transcripts is unique in that at least two regions, both of which are different from the spliced regions in the three above-mentioned retrotransposons, are spliced from autonomous CRR elements. CRR elements seem to encode all protein domains within a single ORF (NAGAKI et al. 2005) and thus cannot regulate the GAG (GAG-PRO) to GAG-PRO-POL ratio by the two common translational recoding mechanisms on the basis of stop-codon readthrough or ribosomal frameshifting (SWANSTROM and WILLS 1997; VOGT 1997; GAO et al. 2003). However, as GAG is generally required in a higher molar amount than POL (SWANSTROM and WILLS 1997), CRRs must use a strategy that favors expression of GAG or GAG-PRO over GAG-PRO-POL polyproteins. Considering that the splicing of the reverse-transcriptase-coding region in CRRs results in separation of gag-pro and pol genes into different reading frames, it is likely that translation of the pol gene from spliced transcripts is suppressed while the translation of gag and pro genes remains unaffected. As only some CRR transcripts are spliced, the ratio between GAG-PRO and GAG-PRO-POL polyproteins might be controlled by the ratio between spliced and unspliced transcripts. Interestingly, the splice sites appear to be conserved among all autonomous CRR subfamilies (including the newly discovered CRR3 subfamily; Figures 6 and 7) as well as in the centromeric retrotransposons from maize (CRM) and barley (Cereba), supporting our hypothesis that CRR splicing may play a role in regulation of the expression of its encoding genes.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
| LITERATURE CITED |
|---|
|
|
|---|
ALTSCHUL, S. F., T. L. MADDEN, A. A. SCHAFFER, J. H. ZHANG, Z. ZHANG et al., 1997 Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25: 33893402.
BANNISTER, A. J., P. ZEGERMAN, J. F. PARTRIDGE, E. A. MISKA, J. O. THOMAS et al., 2001 Selective recognition of methylated lysine 9 on histone H3 by the HP1 chromo domain. Nature 410: 120124.[CrossRef][Medline]
BERNSTEIN, E., and C. D. ALLIS, 2005 RNA meets chromatin. Genes Dev. 19: 16351655.
BOGUSKI, M. S., T. M. J. LOWE and C. M. TOLSTOSHEV, 1993 Dbest: database for expressed sequence tags. Nat. Genet. 4: 332333.[CrossRef][Medline]
BOUZINBA-SEGARD, H., A. GUAIS and C. FRANCASTEL, 2006 Accumulation of small murine minor satellite transcripts leads to impaired centromeric architecture and function. Proc. Natl. Acad. Sci. USA 103: 87098714.
BRIERLEY, C., and A. J. FLAVELL, 1990 The retrotransposon Copia controls the relative levels of its gene products posttranscriptionally by differential expression from its two major mRNAs. Nucleic Acids Res. 18: 29472951.
BRODIE, R., R. L. ROPER and C. UPTON, 2004 JDotter: a Java interface to multiple dotplots generated by dotter. Bioinformatics 20: 279281.
CAM, H. P., T. SUGIYAMA, E. S. CHEN, X. CHEN, P. C. FITZGERALD et al., 2005 Comprehensive analysis of heterochromatin- and RNAi-mediated epigenetic control of the fission yeast genome. Nat. Genet. 37: 809819.[CrossRef][Medline]
CHAN, S. W. L., I. R. HENDERSON and S. E. JACOBSEN, 2005 Gardening the genome: DNA methylation in Arabidopsis thaliana. Nat. Rev. Genet. 6: 351360.[CrossRef][Medline]
CHENG, Z. K., F. G. DONG, T. LANGDON, O. Y. SHU, C. R. BUELL et al., 2002 Functional rice centromeres are marked by a satellite repeat and a centromere-specific retrotransposon. Plant Cell 14: 16911704.
COSTA, F. F., 2005 Non-coding RNAs: new players in eukaryotic biology. Gene 357: 8394.[CrossRef][Medline]
DESHPANDE, G., G. CALHOUN and P. SCHEDL, 2005 Drosophila argonaute-2 is required early in embryogenesis for the assembly of centric/centromeric heterochromatin, nuclear division, nuclear migration, and germ-cell formation. Genes Dev. 19: 16801685.
DURAND-DUBIEF, M., and P. BASTIN, 2003 TbAGOI, an Argonaute protein required for RNA interference, is involved in mitosis and chromosome segragation in Trypanosoma brucei. BMC Biol. 1: 2.[CrossRef][Medline]
ECHENIQUE, V., B. STAMOVA, P. WOLTERS, G. LAZO, V. L. CAROLLO et al., 2002 Frequencies of Ty1-copia and Ty3-gypsy retroelements within the Triticeae EST databases. Theor. Appl. Genet. 104: 840844.[CrossRef][Medline]
ELGIN, S. C. R., 1996 Heterochromatin and gene regulation in Drosophila. Curr. Opin. Genet. Dev. 6: 193202.[CrossRef][Medline]
FISCHLE, W., Y. M. WANG, S. A. JACOBS, Y. C. KIM, C. D. ALLIS et al., 2003 Molecular basis for the discrimination of repressive methyl-lysine marks in histone H3 bv Polvcomb and HP1 chromodomains. Genes Dev. 17: 18701881.
FUKAGAWA, T., M. NOGAMI, M. YOSHIKAWA, M. IKENO, T. OKAZAKI et al., 2004 Dicer is essential for formation of the heterochromatin structure in vertebrate cells. Nat. Cell Biol. 6: 784791.[CrossRef][Medline]
GAO, L. H., E. M. MCCARTHY, E. W. GANKO and J. F. MCDONALD, 2004 Evolutionary history of Oryza sativa LTR retrotransposons: a preliminary survey of the rice genome sequences. BMC Genomics 5: 18.[CrossRef][Medline]
GAO, X., E. R. HAVECKER, P. V. BARANOV, J. F. ATKINS and D. F. VOYTAS, 2003 Translational recoding signals between gag and pol in diverse LTR retrotransposons. RNA 9: 14221430.
GAUT, B. S., B. R. MORTON, B. C. MCCAIG and M. T. CLEGG, 1996 Substitution rate comparisons between grasses and palms: synonymous rate differences at the nuclear gene Adh parallel rate differences at the plastid gene rbcL. Proc. Natl. Acad. Sci. USA 93: 1027410279.
GENDREL, A. V., and V. COLOT, 2005 Arabidopsis epigenetics: when RNA meets chromatin. Curr. Opin. Plant Biol. 8: 142147.[CrossRef][Medline]
GORINSEK, B., F. GUBENSEK and D. KORDIS, 2004 Evolutionary genomics of chromoviruses in eukaryotes. Mol. Biol. Evol. 21: 781798.
GORINSEK, B., F. GUBENSEK and D. KORDIS, 2005 Phylogenomic analysis of chromoviruses. Cytogenet. Genome Res. 110: 543552.[CrossRef][Medline]
GRANDBASTIEN, M. A., 1998 Activation of plant retrotransposons under stress conditions. Trends Plant Sci. 3: 181187.[CrossRef]
GREWAL, S. I. S., and J. C. RICE, 2004 Regulation of heterochromatin by histone methylation and small RNAs. Curr. Opin. Cell Biol. 16: 230238.[CrossRef][Medline]
HIROCHIKA, H., 1997 Retrotransposons of rice: their regulation and use for genome analysis. Plant Mol. Biol. 35: 231240.[CrossRef][Medline]
HIROCHIKA, H., K. SUGIMOTO, Y. OTSUKI, H. TSUGAWA and M. KANDA, 1996 Retrotransposons of rice involved in mutations induced by tissue culture. Proc. Natl. Acad. Sci. USA 93: 77837788.
HUDAKOVA, S., W. MICHALEK, G. G. PRESTING, R. TEN HOOPEN, K. DOS SANTOS et al., 2001 Sequence organization of barley centromeres. Nucleic Acids Res. 29: 50295035.
JENUWEIN, T., and C. D. ALLIS, 2001 Translating the histone code. Science 293: 10741080.
JIANG, N., I. K. JORDAN and S. R. WESSLER, 2002 Dasheng and RIRE2: a nonautonomous long terminal repeat element and its putative autonomous partner in the rice genome. Plant Physiol. 130: 16971705.
JOHNSON, C., L. BOWMAN, A. T. ADAI, V. VANCE and V. SUNDARESAN, 2007 CSRDB: a small RNA integrated database and browser resource for cereals. Nucleic Acids Res. 35: D829D833.[CrossRef][Medline]
KANELLOPOULOU, C., S. A. MULJO, A. L. KUNG, S. GANESAN, R. DRAPKIN et al., 2005 Dicer-deficient mouse embryonic stem cells are defective in differentiation and centromeric silencing. Genes Dev. 19: 489501.
KELLOGG, E. A., 2001 Evolutionary history of the grasses. Plant Physiol. 125: 11981205.
KIKUCHI, S., K. SATOH, T. NAGATA, N. KAWAGASHIRA, K. DOI et al., 2003 Collection, mapping, and annotation of over 28,000 cDNA clones from japonica rice. Science 301: 376379.
KORDIS, D., 2005 A genomic perspective on the chromodomain-containing retrotransposons: chromoviruses. Gene 347: 161173.[CrossRef][Medline]
LEE, H. R., P. NEUMANN, J. MACAS and J. JIANG, 2006 Transcription and evolutionary dynamics of the centromeric satellite repeat CentO in rice. Mol. Biol. Evol. 23: 25052520.
LI, Z. Y., S. Y. CHEN, X. W. ZHENG and L. H. ZHU, 2000 Identification and chromosomal localization of a transcriptionally active retrotransposon of Ty3-gypsy type in rice. Genome 43: 404408.[Medline]
MAISON, C., D. BAILLY, A. H. F. M. PETERS, J. P. QUIVY, D. ROCHE et al., 2002 Higher-order structure in pericentric heterochromatin involves a distinct pattern of histone modification and an RNA component. Nat. Genet. 30: 329334.[CrossRef][Medline]
MARCHLER-BAUER, A., J. B. ANDERSON, C. DEWEESE-SCOTT, N. D. FEDOROVA, L. Y. GEER et al., 2003 CDD: a curated Entrez database of conserved domain alignments. Nucleic Acids Res. 31: 383387.
MATZKE, M. A., and J. A. BIRCHLER, 2005 RNAi-mediated pathways in the nucleus. Nat. Rev. Genet. 6: 2435.[CrossRef][Medline]
MAY, B. P., Z. B. LIPPMAN, Y. D. FANG, D. L. SPECTOR and R. A. MARTIENSSEN, 2005 Differential regulation of strand-specific transcripts from Arabidopsis centromeric satellite repeats. PLoS Genet. 1: 705714.
MEYERS, B. C., S. V. TINGLEY and M. MORGANTE, 2001 Abundance, distribution, and transcriptional activity of repetitive elements in the maize genome. Genome Res. 11: 16601676.
MILLER, J. T., F. G. DONG, S. A. JACKSON, J. SONG and J. M. JIANG, 1998 Retrotransposon-related DNA sequences in the centromeres of grass chromosomes. Genetics 150: 16151623.
MOREY, C., and P. AVNER, 2004 Employment opportunities for non-coding RNAs. FEBS Lett. 567: 2734.[CrossRef][Medline]
MURCHISON, E. P., J. F. PARTRIDGE, O. H. TAM, S. CHELOUFI and G. J. HANNON, 2005 Characterization of Dicer-deficient murine embryonic stem cells. Proc. Natl. Acad. Sci. USA 102: 1213512140.
NAGAKI, K., and M. MURATA, 2005 Characterization of CENH3 and centromere-associated DNA sequences in sugarcane. Chromosome Res. 13: 195203.[CrossRef][Medline]
NAGAKI, K., J. Q. SONG, R. M. STUPAR, A. S. PAROKONNY, Q. P. YUAN et al., 2003 Molecular and cytological analyses of large tracks of centromeric DNA reveal the structure and evolutionary dynamics of maize centromeres. Genetics 163: 759770.
NAGAKI, K., Z. K. CHENG, S. OUYANG, P. B. TALBERT, M. KIM et al., 2004 Sequencing of a rice centromere uncovers active genes. Nat. Genet. 36: 138145.[CrossRef][Medline]
NAGAKI, K., P. NEUMANN, D. F. ZHANG, S. OUYANG, C. R. BUELL et al., 2005 Structure, divergence, and distribution of the CRR centromeric retrotransposon family in rice. Mol. Biol. Evol. 22: 845855.
NEUMANN, P., D. POZARKOVA and J. MACAS, 2003 Highly abundant pea LTR retrotransposon Ogre is constitutively transcribed and partially spliced. Plant Mol. Biol. 53: 399410.[CrossRef][Medline]
NEUMANN, P., D. POZARKOVA, A. KOBLIZKOVA and J. MACAS, 2005 PIGY, a new plant envelope-class LTR retrotransposon. Mol. Genet. Genomics 273: 4353.[CrossRef][Medline]
PAL-BHADRA, M., B. A. LEIBOVITCH, S. G. GANDHI, M. RAO, U. BHADRA et al., 2004 Heterochromatin silencing and HP1 localization in Drosophila are dependent on the RNAi machinery. Science 303: 669672.
PIDOUX, A. L., and R. C. ALLSHIRE, 2005 The role of heterochromatin in centromere function. Philos. Trans. R. Soc. Lond. B Biol. Sci. 360: 569579.[CrossRef][Medline]
PRESTING, G. G., L. MALYSHEVA, J. FUCHS and I. SCHUBERT, 1998 A TY3/GYPSY retrotransposon-like sequence localizes to the centromeric regions of cereal chromosomes. Plant J. 16: 721728.[CrossRef][Medline]
RABSON, A. B., and B. J. GRAVES, 1997 Synthesis and processing of viral RNA, pp. 205261 in Retroviruses, edited by J. M. COFFIN, S. H. HUGHES and H. E. VARMUS. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.
RICE, P., I. LONGDEN and A. BLEASBY, 2000 EMBOSS: The European molecular biology open software suite. Trends Genet. 16: 276277.[CrossRef][Medline]
ROSSI, M., P. G. ARAUJO and M. A. VAN SLUYS, 2001 Survey of transposable elements in sugarcane expressed sequence tags (ESTs). Genet. Mol. Biol. 24: 147154.
SONNHAMMER, E. L. L., and R. DURBIN, 1995 A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis. Gene 167: 110.[CrossRef][Medline]
STADEN, R., 1996 The Staden sequence analysis package. Mol. Biotechnol. 5: 233241.[Medline]
SUNKAR, R., T. GIRKE, P. K. JAIN and J. K. ZHU, 2005a Cloning and characterization of MicroRNAs from rice. Plant Cell 17: 13971411.
SUNKAR, R., T. GIRKE and J. K. ZHU, 2005b Identification and characterization of endogenous small interfering RNAs from rice. Nucleic Acids Res. 33: 44434454.
SUONIEMI, A., D. SCHMIDT and A. H. SCHULMAN, 1997 BARE-1 insertion site preferences and evolutionary conservation of RNA and cDNA processing sites. Genetica 100: 219230.[CrossRef][Medline]
SWANSTROM, R., and J. W. WIL