- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Zhang, X.
- Articles by Wessler, S. R.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Zhang, X.
- Articles by Wessler, S. R.
PIF- and Pong-Like Transposable Elements: Distribution, Evolution and Relationship With Tourist-Like Miniature Inverted-Repeat Transposable Elements
Xiaoyu Zhanga, Ning Jianga, Cédric Feschottea, and Susan R. Wessleraa Departments of Plant Biology and Genetics, University of Georgia, Athens, Georgia 30602
Corresponding author: Susan R. Wessler, University of Georgia, Athens, GA 30602., sue{at}plantbio.uga.edu (E-mail)
Communicating editor: M. J. SIMMONS
| ABSTRACT |
|---|
Miniature inverted-repeat transposable elements (MITEs) are short, nonautonomous DNA elements that are widespread and abundant in plant genomes. Most of the hundreds of thousands of MITEs identified to date have been divided into two major groups on the basis of shared structural and sequence characteristics: Tourist-like and Stowaway-like. Since MITEs have no coding capacity, they must rely on transposases encoded by other elements. Two active transposons, the maize P Instability Factor (PIF) and the rice Pong element, have recently been implicated as sources of transposase for Tourist-like MITEs. Here we report that PIF- and Pong-like elements are widespread, diverse, and abundant in eukaryotes with hundreds of element-associated transposases found in a variety of plant, animal, and fungal genomes. The availability of virtually the entire rice genome sequence facilitated the identification of all the PIF/Pong-like elements in this organism and permitted a comprehensive analysis of their relationship with Tourist-like MITEs. Taken together, our results indicate that PIF and Pong are founding members of a large eukaryotic transposon superfamily and that members of this superfamily are responsible for the origin and amplification of Tourist-like MITEs.
TRANSPOSABLE elements (TEs), which are a major component of all characterized eukaryotic genomes, have been divided into two classes according to their transposition intermediate. Class 1 (RNA) elements transpose via an RNA intermediate and most either have long terminal repeats (LTR-retrotransposons) or terminate at one end with a poly(A) tract [long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs)]. Class 2 (DNA) elements transpose via a DNA intermediate and usually have short terminal inverted repeats (TIRs). DNA elements can be further classified into families on the basis of the transposase (TPase) that catalyzes their movement. A TE family is composed of one or more TPase-encoding autonomous elements and up to several thousand nonautonomous elements that do not encode functional TPases but retain the cis-sequences necessary to be mobilized by the cognate TPase (for reviews see ![]()
![]()
Miniature inverted-repeat transposable elements (MITEs) were first discovered in the grasses and later found in other flowering plants as well as in animal genomes (for review see ![]()
![]()
![]()
Two distantly related families of active DNA transposons have recently been associated with Tourist-like MITEs. The maize P Instability Factor (PIF) and a Tourist-like MITE family called miniature PIF (mPIF) share identical TIRs, similar subterminal sequences, and a strong preference for insertion into the 9-bp palindrome CWCTTAGWG with duplication of the central TTA (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
|
PIF and Pong elements share several features, including amino acid sequence conservation in the catalytic domain of ORF2, homology in their ORF1's, nucleotide sequence similarity in their TIRs, and identical TSDs (TTA; ![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
In this study we look at the distribution and evolution of the PIF/IS5 superfamily of transposases and characterize their relationship with Tourist-like MITEs. To this end we conducted a systematic survey (database searches and PCR assays) of putative PIF- and Pong-like TPases in plants and animals. Phylogenetic analyses of >600 TPase fragments from 56 species define three major groups, each represented by multiple ancient and distinct lineages. The availability of virtually the entire sequence of rice (![]()
![]()
| MATERIALS AND METHODS |
|---|
PCR amplification of PIF-like TPases:
Degenerate primers were derived from the regions encoding amino acid residues GALDGTH (D1F1, 5'-GGIGCHHTIGATGGHACWCA-3'; I, Inosine; H, A, C, or T; W, A, or T) and ELFNPRH (KR1, 5'-ATGICKMIRRTTRAACAAYTC-3'; K, G, or T; M, A, or C; R, A, or G; Y, C, or T; Fig 1A, positions indicated by arrows). PCR amplifications were performed with 10100 ng of genomic DNA in 50-µl reactions. Cycling parameters were: 1 cycle at 94° for 3 min, 36 cycles at 94° for 30 sec, 50° for 30 sec, 72° for 1 min, and 1 cycle at 72° for 5 min. Forty microliters of the reaction was resolved on 1% agarose gels, and desired fragments were purified from agarose using the QIAquick gel extraction kit (QIAGEN, Valencia, CA) and cloned using the TOPO-TA cloning kit (Invitrogen, San Diego) according to manufacturers' instructions. Sequencing reactions were performed by the Molecular Genetics Instrumentation Facility of the University of Georgia. The sequences of 28 PIF-like TPase fragments described here were deposited in the GenBank database (accession nos.
AY362792,
AY362793,
AY362794,
AY362795,
AY362796,
AY362797,
AY362798,
AY362799,
AY362800,
AY362801,
AY362802,
AY362803,
AY362804,
AY362805,
AY362806,
AY362807,
AY362808,
AY362809,
AY362810,
AY362811,
AY362812,
AY362813,
AY362814,
AY362815,
AY362816,
AY362817,
AY362818,
AY362819).
Database searches and sequence and phylogenetic analyses:
Database searches were performed with blast servers available from the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov, databases nr, gss, est, and wgs_Anopheles) as well as the rice genomic database at The Institute for Genomic Research (http://tigrblast.tigr.org/euk-blast/index.cgi?project=osa1). Nucleotide sequences obtained from database searches and degenerate PCR (dPCR) amplifications were conceptually translated into amino acid sequences and aligned with CLUSTALW. Introns were predicted with Netgene2 [available at http://www.cbs.dtu.dk (![]()
![]()
|
| RESULTS |
|---|
Distribution and abundance of PIF/Pong-like TPases:
Identification of PIF/Pong-like TPases through database searches:
A systematic survey was carried out using the TPases of PIF and Pong as queries in tBlastn searches against several public databases. Significant similarity was detected in >1000 entries from a wide range of eukaryotic species, including 21 plants, 19 animals, and two fungi (listed in Table 1). Six hundred and seventy-three hits (574 were unique) with significant homology to the catalytic domains of PIF or Pong were selected for further analysis.
|
One striking result from database searches was the abundance of sequences for PIF/Pong-like TPases in some organisms, especially plants. This is most apparent in genomes with large amounts of sequence information: for the catalytic domains alone, there were
80 hits (e-value < 10-23) in Arabidopsis thaliana,
350 hits (e-value < 10-10) in rice (Oryza sativa c.v. Nipponbare), and
170 hits (e-value < 10-30) in Brassica oleracea [
30% of its 600-Mb genome available for blast at TIGR (http://www.tigr.org/tdb/e2k1/bog1/)]. In animals, the number of sequences for PIF/Pong-like TPases varies from >100 in the African malaria mosquito (Anopheles gambiae) and
300400 in zebrafish (Danio rerio) to a few (<3) in Drosophila melanogaster, Caenorhabditis elegans, and human.
Nucleotide sequences of the 574 unique PIF- and Pong-like TPases were conceptually translated into amino acid sequences after removal of introns (see below) and judicious correction of frameshifts caused by small (12 bp) insertions or deletions. The resulting amino acid sequences were compared to detect conserved regions that might signify functional domains. Several blocks of highly conserved residues were identified for PIF-like TPases (Fig 1A). Block H corresponds to a predicted HTH domain that may be involved in DNA binding. Blocks N2, N3, and C1 most likely comprise the catalytic domain as they contain an apparent DDE motif with the three acidic residues centered in blocks N2, N3, and C1, respectively. A DDE motif is also present in Pong-like TPases (Fig 1B), but, unlike PIF, no HTH domain was predicted.
PIF/Pong-like TPases are usually adjacent to ORF1 homologs:
tBlastn searches using the ORF1's of PIF and Pong as queries also yielded a large number of hits. When located on long contigs, these ORF1 homologs were usually found within 12 kb of PIF- or Pong-like TPases, indicating that each "pair" of ORF1 and TPase was encoded by the same element. In fact, when the termini of PIF/Pong-like elements were defined in O. sativa (see below), A. thaliana, and A. gambiae (X. ZHANG, C. FESCHOTTE, and S. R. WESSLER, unpublished data), nearly all elements were found to encode both ORFs. ORF1s are significantly more divergent than the TPases. Two blocks of conserved residues were found in PIF/Pong-like ORF1's (Fig 1, blocks A and B), with the most conserved block (block A) centered in a
100-amino-acid (aa) region that displays weak homology to the DNA-binding domains of myb transcription factors from some plants and animals (![]()
Additional PIF-like TPases from grasses:
The majority of PIF-like sequences (
80%) were from only a few species (rice, Arabidopsis, B. oleracea, and A. gambiae) since this survey was limited by the availability of DNA sequences in databases. To better resolve the phylogeny of PIF elements, additional TPase sequences were isolated from species with established evolutionary relationships but limited sequence information. To this end, a dPCR procedure was employed to amplify PIF-like TPase fragments from selected grass species. Grasses were chosen for this analysis because their phylogeny is well characterized (![]()
![]()
![]()
![]()
dPCR primers were derived from the conserved blocks N2 and C1 in PIF-like TPases (see Fig 1A for positions and MATERIALS AND METHODS for sequences) and used to amplify an
120-aa region from 20 grass species as well as several basal monocots (listed in Table 1). The amplified region included the majority of the catalytic domain in PIF-like TPases, extending from 3 aa upstream of the first Asp to 8 aa upstream of the Glu of the DDE motif (Fig 1A, boxed region). PCR products of the expected size (
360 bp) were successfully amplified from all 20 grasses tested and their close relative Joinvillea, as well as several Asparagales (e.g., Gongora ilense; data not shown). In addition to the
360-bp fragments, most species yielded larger PCR products (
450 bp) that, when sequenced, were found to contain an intron (see below).
Forty-five fragments from 15 species were sequenced; all were unique, indicating that there are multiple distinct TPases in each species and that only a small fraction had been sampled. No product was amplified from the more basal monocots such as Zamia, Ginkgo, or Gnetum. Failure to amplify TPase fragments by dPCR from these species may be due to nucleotide variation in the primer recognition region or to the absence of PIF-like TPases.
Phylogeny of PIF-like and Pong-like TPases:
Three major clusters of PIF- and Pong-like TPases:
The TPase fragments identified by database mining and those isolated by dPCR were pooled and their evolutionary relationships examined. A multiple alignment was constructed from the 45 dPCR products and 574 unique database hits and used to generate an unrooted phylogenetic tree (Fig 2). The majority of the sequences clustered into three groups: the plant PIF-like group, the plant Pong-like group, and the animal group. In addition, the five fungal sequences clustered into two small, species-specific groups.
Clustering of the two plant groups was supported by bootstrap values as well as by several features that were shared within each group but not between groups. First, the spacing (i.e., numbers of residues) between the second Asp and the Glu of the DDE motif differed between PIF-like and Pong-like groups but was consistent within each group. PIF-like TPases exhibit DD47E or DD48E spacing whereas Pong-like TPases exhibit an invariant DD35E spacing. Second, the TIRs of PIF- and Pong-like elements contain sequence motifs that are highly conserved within each group but distinct between the two groups (see below). On the basis of the comparison of TPase sequences, the animal group was related equally to both plant groups.
Phylogeny of plant PIF- and Pong-like TPases: Phylogenetic relationships among plant PIF- and Pong-like elements were determined by analyzing a subset of 99 sequences (63 PIF-like, 36 Pong-like) that were selected to represent the different lineages within each group. A CLUSTALW multiple alignment was constructed from the catalytic domains of these sequences and used to generate a phylogenetic tree (Fig 3). Both plant groups are monophyletic and heterogeneous. In each group, amino acid identity between sequences from distantly related species can be higher than that between two sequences from the same or from a closely related species, suggesting the presence of multiple ancient lineages of both PIF- and Pong-like elements.
|
The plant PIF-like group is composed of four major lineages (AD). Lineage A is the largest and most complex with members from both monocots and dicots. It can be further divided into five sublineages (A1A5). A1 includes five grass subfamilies (Panicoideae, Ehrhartoideae, Bambusoideae, Pooideae, and the ancestral Pharoideae), indicating that this sublineage was present before the diversification of the grasses
70 MYA. Although only two grass subfamilies (Panicoideae and Ehrhartoideae) contributed sequences to A2, this lineage may be even more ancient than A1 as it is also found in the orchid G. ilense (order Asparagales). A3 and A4 are each found in a single dicot family (A3 in Brassicaceae and A4 in Fabaceae). A5 is present in both monocots and dicots and includes the only known active PIF-like element, the maize PIF. B and C are two small lineages from dicots, both restricted to the Brassicaceae family. Lineage D is another monocot-specific lineage found in four grass subfamilies (Panicoideae, Ehrhartoideae, Bambusoideae, and Pharoideae).
Pong-like TPases clustered into three major lineages (OQ). Lineage O included two sublineages, the dicot-specific O1 and the monocot-specific O2. Lineage P is dicot specific, suggesting that it emerged in dicots after their separation from monocots. P could also be divided into two sublineages (P1 and P2). P1 was found in only the Brassicaceae family and included the majority of Pong-like TPases from A. thaliana (71%) and nearly all from B. oleracea (137 of 139). The P2 sublineage is probably older than P1 as it is also present in the Fabaceae family. Most TPase sequences in lineages Q were from O. sativa. However, the presence of one sequence from Zea mays and two from Lotus japonicus in lineage Q suggests that it is also an ancient lineage.
Introns in plant PIF-like elements:
Although the original maize PIF element lacks introns (![]()
Two models have been proposed to explain the diversity of introns associated with related coding sequences: the "intron-early" model (loss of introns from an intron-rich ancestor; ![]()
![]()
![]()
![]()
TPase/ORF1 arrangements: Several different arrangements of TPase and ORF1 were observed for PIF- and Pong-like elements. All elements within a lineage or sublineage exhibit the same organization. Specifically, TPase and ORF1 in PIF-like elements are transcribed toward the same direction ("tail-to-head") but are organized in two different patterns, with the TPase gene located upstream of ORF1 in lineage A but downstream of ORF1 in lineages B, C, and D. Three different arrangements were found for Pong-like elements. TPase and ORF1 were organized in a "head-to-head" alignment for O1, a "tail-to-tail" alignment for O2, and a tail-to-head alignment for P and Q with TPase located downstream of ORF1 (see Fig 3).
PIF- and Pong-like elements in rice:
The availability of virtually the entire genomic sequence of O. sativa (![]()
![]()
PIF and Pong TPases: tBlastn searches using as queries the TPases of PIF and Pong led to the identification of 205 and 145 hits (e-value < -10), respectively, from the TIGR rice database (ssp. japonica, cv. Nipponbare). Duplicate hits located on overlapping regions of bacterial artificial chromosomes were excluded as were severely truncated TPases (containing <50% of the complete coding region). The remaining 116 PIF-like TPases and 80 Pong-like TPases were relatively full-length and contained the entire catalytic domain. After removal of introns and correction of frameshifts caused by small insertion/deletions (12 bp), full-length PIF-like TPases were found to range in size from 392 to 432 aa, while full-length Pong-like TPases were from 416 to 549 aa.
The evolutionary relationships of rice PIF- and Pong-like TPases were determined by generating phylogenetic trees from CLUSTALW multiple alignments of their catalytic domains (Fig 4 and Fig 5). Two major lineages for PIF-like TPases that correspond to lineages A (including sublineages A1, A2, and A5) and D were resolved as shown in Fig 3. Correlation between intron content of a PIF-like TPase and the TPase lineage is similar to that described in the broader plant survey (Fig 3) except for two additional intron loss events: OsPIF3 lost intron 1 and OsPIF18 lost intron 2. A tail-to-head alignment for TPase and ORF1 was found for all OsPIF elements (TPase located upstream of ORF1 in lineage A but downstream of ORF1 in lineage D) with one exception: the two ORFs in OsPIF6 are organized in a tail-to-tail alignment, possibly due to a recent rearrangement. Pong-like TPases also clustered into two major groups, corresponding to the sublineage O2 and lineage Q in Fig 3. The ORF1/TPase alignment in lineage Q was found to be tail-to-head while that in lineage O2 is head-to-head.
|
|
Characterizing full-length elements: PIF- and Pong-like TPases were grouped into families on the basis of TPase sequence identities, with members of the same family being >90% identical. In this way, 27 PIF-like families and 26 Pong-like families (including the Pong family) were defined. These families were designated OsPIF (for O. sativa PIF) and OsPong (for O. sativa Pong), followed by the number of the family. Elements of the same family were further designated with a letter (e.g., OsPIF1a and OsPIF1b, see Fig 4 and Fig 5).
The identification of complete OsPIF and OsPong elements was complicated by the fact that interfamily comparisons indicated that sequence similarity was restricted to the known ORFs. For this reason, full-length elements were identified by comparison of sequences flanking the TPases within the same family where high sequence similarity extended into sequences flanking the ORFs. Sequences marking the boundary of similarity between elements of the same family were then searched for TIRs related to those of PIF or Pong and the flanking 3-bp TSDs characteristic of PIF and Pong elements (TTA/TAA). In this way, TIRs of 21 of the 27 OsPIF families (71 elements) and 20 of the 26 OsPong families (61 elements) were identified (see supplemental data at http://www.genetics.org/supplemental/ for accession numbers and positions).
The TIRs of OsPIF's were of variable length, ranging from 10 bp (OsPIF4) to 45 bp (OsPIF20). In contrast, OsPong TIRs were more uniform: all were 1418 bp long except for one family (represented by a single-element OsPong5, 66-bp TIRs). Comparison of the TIRs of OsPIF's and OsPong's showed similarities (most began with 5'-GGSC-3', where S represents G or C) as well as differences (the fifth nucleotide was usually A in OsPong's but was rarely an A in OsPIF's; Fig 6, a and b). The inner TIRs contained PIF-specific and Pong-specific motifs [OsPIF, 5'-TGTTTGGTT-3' (positions 614); OsPong, 5'-STMCAA-3' (positions 712), where M stands for A or C].
|
Full-length OsPIF's ranged from 2305 bp (OsPIF23a) to 22,169 bp (OsPIF25c) and OsPong's ranged from 2612 bp (OsPong18d) to 18,753 bp (OsPong15a). Most of this variation was due to the insertion of other TEs. These secondary TE insertions were located by searching full-length elements with RepeatMasker and Blastn. Sixteen OsPIF's and 24 OsPong's were found to contain a variety of TEs insertions (see Fig 4 and Fig 5 for their positions and identities), including other DNA elements [Ac-like, Mutator-like (MULEs), and CACTA-like], MITEs (Tourist-like and Stowaway-like), LTR retroelements, solo LTRs (Copia-like and Gypsy-like), non-LTR retroelements (LINEs), and, in one case, a Helitron element. In a few instances, members of OsPIF and OsPong families (e.g., OsPIF3, OsPIF16, and OsPong18) harbored the same MITE insertion at the same position, indicating that the MITE insertion did not prevent further transposition of these elements. When TE insertions were excluded, the length of most OsPIF's (50 of 71) and OsPong's (45 of 61) was found to be in the range of 46 kb.
In several instances, OsPIF and OsPong families include elements that are nearly identical, suggesting that they transposed recently and may still be capable of further transposition. For example, OsPIF6 includes five complete elements (
4.1 kb) located on four different chromosomes (chromosomes 3, 7, 9, and 10) that are, on average,
99.6% identical over their entire length. In addition, their coding sequences are not interrupted by stop codons or frameshifts. Similarly, OsPong20 includes four elements (
5.3 kb) that are on average >99.7% identical.
In contrast, interfamily sequence conservation, even between closely related families, was restricted to coding regions and TIRs. For example, the nucleotide sequences of OsPIF9 and OsPIF7 were 60% identical in their TPases (
1.2 kb) and 55% identical in their ORF1's (
1 kb), but these two families did not share additional sequence similarity aside from their TIRs. Similarly, the OsPong2 and OsPong3 families share 81 and 85% nucleotide identity in their ORF1 (
900 bp) and TPase (
1.3 kb) coding regions, respectively, but have completely diverged noncoding regions.
Coevolution of ORF1 and TPase in OsPong's: All 21 OsPIF families and 19 of the 20 OsPong families with defined termini encoded both ORF1 and TPase. The only OsPong family with defined termini that does not harbor an ORF1 is OsPong18, where all eight elements contain only TPase. Absence of ORF1 in these elements is likely due to internal deletion for two reasons. First, the length of these elements is unusually short, ranging from 1365 to 2745 bp. Second, Blastn searches using OsPong18 elements as queries identified several additional family members that do not encode TPase but contain coding sequence for ORF1 (e.g., AP003799; 96,10599,317). Thus, all OsPong families with defined ends encode both ORF1 and TPase.
As mentioned above, ORF1 and TPase in OsPIF's are arranged in three different alignments and those in OsPong's have two different alignments. Interelement recombination has been shown to be a significant force in the evolution of mobile elements (e.g., ![]()
![]()
![]()
![]()
Two phylogenetic trees were generated for OsPong elements, one based on an
110-aa region in their ORF1s including all three conserved blocks (Fig 1B, boxed region) and one based on the catalytic domains of their TPases (Fig 7). Comparison of the two trees showed that the phylogenies determined from the two coding sequences were consistent. These results indicate that interelement recombination had probably not occurred in OsPong's. Similarly, the phylogeny of OsPIF ORF1s was found to be consistent with that of the TPases, albeit with less bootstrap support (data not shown).
|
Insertion sites of OsPIF's and OsPong's:
Prior studies have shown that some plant DNA transposons, such as members of the Ac/Ds and Mutator families, have a preference for insertion into single-copy regions of the genome (![]()
![]()
![]()
![]()
Relationships between OsPIF's, OsPong's, and Tourist-like MITEs:
Identification of full-length OsPIF and OsPong elements permitted the first genome-wide analysis of the relationship between these TPase-encoding elements and Tourist-like MITEs. Sequence identities between OsPIF's and OsPong's and Tourist-like MITEs were determined in two ways. First, we investigated whether Tourist-like MITE families could be associated with OsPIF's or OsPong's on the basis of the sequences of their TIRs. To do this, the TIRs of 31 published rice Tourist-like MITE families were examined (![]()
![]()
![]()
![]()
In some cases a MITE family was clearly identified as a deletion derivative of a particular OsPIF or OsPong family (Fig 8). For example, the high copy number Castaway family (
3000 copies; ![]()
![]()
|
| DISCUSSION |
|---|
PIF- and Pong-like elements are widespread and abundant:
Here we present the first comprehensive analysis of PIF- and Pong-like elements in eukaryotes. Prior studies reported that the TPases of PIF and Pong were distantly related to the bacterial IS5 TPases and noted the presence of PIF-like elements in several plant (rice, sorghum, and Arabidopsis), animal (C. elegans, C. briggsae, and fugu fish) and fungal (Filobasisiella neoformans) genomes (![]()
![]()
![]()
![]()
![]()
![]()
![]()
PIF- and Pong-like elements are especially abundant in plants, including both monocotyledons and dicotyledons. Large numbers of PIF- and Pong-like TPases were detected in three plants with relatively small genomes:
80 copies in Arabidopsis (130 Mb);
350 copies in rice (450 Mb), and >1000 copies in B. oleracea (extrapolated to
600 Mb). Although significant sequence is not yet available for plants with large genomes, such as maize (2500 Mb) and barley (5000 Mb), the degenerate PCR assay indicates that these genomes also harbor multiple and diverse lineages (Fig 3). Given that amplification of transposable elements is largely responsible for the huge differences in plant genome size (![]()
![]()
ORF1 of PIF/Pong-like elements:
The maize PIF and rice Pong elements encode two ORFs (ORF1 and ORF2), of which ORF2 is most likely the TPase while the function of ORF1 is unknown. Database searches revealed a large number of homologs for both ORFs and, where found, they were usually in pairs with an ORF1 homolog located within
12 kb of an ORF2 homolog. All OsPIF and OsPong families with defined termini also encoded both ORFs. The amino acid sequence similarity between PIF- and Pong-like ORF1s in blocks A and B (Fig 1) suggests a monophyletic origin, and the presence of this ORF in virtually all PIF/Pong-like lineages suggests that it is necessary for the active transposition of these elements.
A requirement for a protein other than the transposase is unusual for a eukaryotic transposon, having been described previously only for members of the CACTA superfamily (![]()
![]()
![]()
![]()
![]()
Several features of ORF1 provide clues to its possible function(s). Weak similarity between the most conserved region in ORF1 (Fig 1, block A) and the myb DNA-binding domain of some plant and animal transcription factors suggests that ORF1 may encode a DNA-binding protein (![]()
OsPIF and OsPong elements:
This study identified 116 OsPIF and 80 OsPong TPases representing all of the lineages of PIF and Pong TPases detected in monocot genomes. As such, rice is a suitable model to study the evolution of PIF- and Pong-like elements in plants as well as their relationship with Tourist-like MITEs.
OsPIF and OsPong elements were grouped into 27 and 26 families, respectively, on the basis of sequence identity of their coding regions. These groupings received additional support when it was determined that elements of the same family share extensive sequence similarity in noncoding regions. Several OsPIF and OsPong families (such as OsPIF4, -5, -6, -9, -12, -13, -23 and OsPong8, -17, -19, -20) include members that are nearly identical. Furthermore, each family includes at least one putative autonomous member whose coding region is not interrupted by a stop codon or a frameshift mutation. These features are indicative of recent and perhaps ongoing activity of multiple OsPIF/OsPong families.
OsPIF, OsPong, and Tourist-like MITEs:
In previous studies, PIF and Pong elements were isolated as the TPase sources for two families of Tourist-like MITEs, mPIF and mPing, respectively (![]()
![]()
Attempts to associate individual Tourist-like MITE families with specific OsPIF or OsPong families uncovered many clear-cut relationships (Fig 8). For example, the MITE family Castaway (
3000 copies) was found to be derived from the OsPIF6 family by internal deletion and subsequent amplification. Relationships between Tourist-like MITEs and OsPong families are less apparent as sequence similarity is limited to the subterminal regions (as shown in Fig 8E and Fig F). However, detection of sequence identity between subterminal regions of OsPong families and Tourist-MITE families, albeit limited, is significant in light of the fact that even closely related OsPong families display no sequence identity in their subterminal regions.
In summary, the characterization of virtually all full-length PIF- and Pong-like elements in the rice genome has permitted a determination of the extent of their relatedness with most of the 60,000 Tourist-like MITEs residing in this genome. Our data indicate that many Tourist-like MITEs originated from OsPIF and OsPong elements by internal deletion and subsequent amplification. However, 16 of the 28 Tourist-like MITEs examined in this study were not clearly associated with OsPIF/OsPong families. It is possible that their cognate OsPIF/OsPong families were lost from the genome. Such a scenario is not difficult to imagine considering OsPIF/OsPong elements are present at much lower copy number (several per family) than are Tourist-like MITEs (hundreds or thousands per family). Alternatively, some Tourist-like MITE families may have originated by chance events, where, for example, a pair of nearby inverted repeats (and other cis requirements, if any) were mobilized fortuitously by an endogenous PIF- or Pong-like TPase and subsequently amplified to high copy numbers.
The rice genome harbors >90,000 MITEs: 60,000 Tourist-like MITEs and 30,000 Stowaway-like MITEs (![]()
![]()
![]()
![]()
2831 bp; ![]()
![]()
![]()
![]()
![]()
![]()
Concluding remarks:
The PIF and Pong elements are founding members of a very large and dynamic superfamily of class 2 elements that are widespread in flowering plants. The impact of these elements is significant, as PIF- and Pong-like families are capable of expansion through the amplification and diversification of both autonomous and nonautonomous members, including very-high-copy-number MITEs. Furthermore, with a demonstrated preference for insertion into genic regions, PIF- and Pong-like elements and their associated Tourist-like MITEs appear to be a major force generating genetic diversity and influencing the evolution of plants.
| FOOTNOTES |
|---|
Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos.
AY362792,
AY362793,
AY362794,
AY362795,
AY362796,
AY362797,
AY362798,
AY362799,
AY362800,
AY362801,
AY362802,
AY362803,
AY362804,
AY362805,
AY362806,
AY362807,
AY362808,
AY362809,
AY362810,
AY362811,
AY362812,
AY362813,
AY362814,
AY362815,
AY362816,
AY362817,
AY362818,
AY362819. ![]()
| ACKNOWLEDGMENTS |
|---|
This work was supported by grants from the National Science Foundation and National Institutes of Health to S.R.W.
Manuscript received May 19, 2003; Accepted for publication October 30, 2003.
| LITERATURE CITED |
|---|
ADEY, N. B., S. A. SCHICHMAN, D. K. GRAHAM, S. N. PETERSON, and M. H. EDGELL et al., 1994 Rodent l1 evolution has been driven by a single dominant lineage that has repeatedly acquired new transcriptional regulatory sequences. Mol. Biol. Evol. 11:778-789.[Abstract]
APARICIO, S., J. CHAPMAN, E. STUPKA, N. PUTNAM, and J. M. CHIA et al., 2002 Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297:1301-1310.
AUGE-GOUILLOU, C., M. H. HAMELIN, M. V. DEMATTEI, M. PERIQUET, and Y. BIGOT, 2001 The wild-type conformation of the Mos-1 inverted terminal repeats is suboptimal for transposition in bacteria. Mol. Genet. Genomics 265:51-57.[CrossRef][Medline]
BENITO, M. I. and V. WALBOT, 1997 Characterization of the maize Mutator transposable element MURA transposase as a DNA-binding protein. Mol. Cell. Biol. 17:5165-5175.[Abstract]
BENNETZEN, J. L., 2002 Mechanisms and rates of genome expansion and contraction in flowering plants. Genetica 115:29-36.[CrossRef][Medline]
BUREAU, T. E., P. C. RONALD, and S. R. WESSLER, 1996 A computer-based systematic survey reveals the predominance of small inverted-repeat elements in wild-type rice genes. Proc. Natl. Acad. Sci. USA 93:8524-8529.
CAPY, P., C. BAZIN, D. HIGUET and T. LANGIN, 1998 Dynamics and Evolution of Transposable Elements. Springer-Verlag, Austin, TX.
CHEN, J., I. GREENBLATT, and S. DELLAPORTA, 1987 Transposition of Ac from the P locus of maize into unreplicated chromosomal sites. Genetics 117:109-116.
CRESSE, A. D., S. H. HULBERT, W. E. BROWN, J. R. LUCAS, and J. L. BENNETZEN, 1995 Mu1-related transposable elements of maize preferentially insert into low copy number DNA. Genetics 140:315-324.[Abstract]
DIETRICH, C. R., F. CUI, M. L. PACKILA, J. LI, and D. A. ASHLOCK et al., 2002 Maize Mu transposons are targeted to the 5' untranslated region of the gl8 gene and sequences flanking Mu target-site duplications exhibit nonrandom nucleotide composition throughout the genome. Genetics 160:697-716.
FESCHOTTE, C. and S. R. WESSLER, 2002 Mariner-like transposases are widespread and diverse in flowering plants. Proc. Natl. Acad. Sci. USA 99:280-285.
FESCHOTTE, C., N. JIANG, and S. R. WESSLER, 2002a Plant transposable elements: where genetics meets genomics. Nat. Rev. Genet. 3:329-341.[CrossRef][Medline]
FESCHOTTE, C., X. ZHANG and S. WESSLER, 2002b Miniature inverted-repeat transposable elements (MITEs) and their relationship with established DNA transposons, pp. 11471158 in Mobile DNA II, edited by N. L. CRAIG, R. CRAIGIE, M. GELLERT and A. M. LAMBOWITZ. American Society for Microbiology Press, Washington, DC.
FESCHOTTE, C., L. SWAMY, and S. R. WESSLER, 2003 Genome-wide analysis of mariner-like transposable elements in rice reveals complex relationships with Stowaway MITEs. Genetics 163:747-758.
GILBERT, W., S. J. DE SOUZA, and M. LONG, 1997 Origin of genes. Proc. Natl. Acad. Sci. USA 94:7698-7703.
GOFF, S. A., D. RICKE, T. H. LAN, G. PRESTING, and R. WANG et al., 2002 A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296:92-100.
HEBSGAARD, S. M., P. G. KORNING, N. TOLSTRUP, J. ENGELBRECHT, and P. ROUZE et al., 1996 Splice site prediction in Arabidopsis thaliana DNA by combining local and global sequence information. Nucleic Acids Res. 24:3439-3452.
JIANG, N. and S. R. WESSLER, 2001 Insertion preference of maize and rice miniature inverted repeat transposable elements as revealed by the analysis of nested elements. Plant Cell 13:2553-2564.
JIANG, N., Z. BAO, X. ZHANG, H. HIROCHIKA, and S. R. EDDY et al., 2003 An active DNA transposon family in rice. Nature 421:163-167.[CrossRef][Medline]
JORDAN, I. K. and J. F. MCDONALD, 1998 Evidence for the role of recombination in the regulatory evolution of Saccharomyces cerevisiae Ty elements. J. Mol. Evol. 47:14-20.[CrossRef][Medline]







