Abstract
Chromodomains are thought to mediate protein-protein interactions between chromatin components. We have detected a chromodomain embedded within the catalytic region of a predicted Arabidopsis DNA methyltransferase that is diverged from other eukaryotic enzymes. The 791 residue “chromomethylase” (CMT1) is encoded by a floral transcript that is spliced from 20 exons and is present at only ~1/10−7 of total mRNA. Genomic sequencing reveals an ancient haplotype split at CMT1 between Col-0 + Metz and the other ecotypes examined. In the Col-0 + Metz haplotype, alternative mRNA processing at intron 13 truncates the coding region. In Ler, RLD, and No-0, similar truncation is caused by insertion of an intact retrotransposon, Evelknievel, which is present as a single copy in Ler and RLD and is currently methylated and inactive. Evelknievel is found at this site on a single branch that connects the Ler, RLD, and No-0 ecotypes but is absent from the genomes of all other ecotypes examined. A stop codon within exon 6 of the Metz ecotype confirms that CMT1 is nonessential. Nevertheless, comparison to CMT1 of Cardaminopsis arenosa, an outcrossing relative, indicates conservation for DNA methyltransferase function. We discuss how allelic diversity of CMT1 may reflect loosened selective constraints in a self-fertilizing species such as Arabidopsis thaliana.
THE presence of differentially methylated cytosines in most eukaryotic genomes raises the question of how methylation patterns are determined. This question remains unanswered. For example, known eukaryotic DNA cytosine methyltransferases utilize hemi-methylated CpG substrates more efficiently than unmethylated substrates, as if they are maintenance enzymes (Bestor 1992). A yet undiscovered enzyme appears to methylate CpG substrates de novo, because mouse embryos that are null for the only known DNA methyltransferase nonetheless establish normal methylation patterns following genome-wide demethylation (Leiet al. 1996). In Arabidopsis thaliana, the MET1 putative DNA methyltransferase is thought to be orthologous to well-characterized animal DNA methyltransferases, which together form a subfamily distinct from their bacterial counterparts by the relatively close similarity of their catalytic regions and the shared presence of an ~1000 aa N-terminal extension. Surprisingly, a phenotype associated with reduced levels of MET1 (Finneganet al. 1996; Ronemuset al. 1996) is caused by de novo methylation of nearly all cytosines in the SUPERMAN regulatory region (Jacobsen and Meyerowitz 1997). This observation indicates that Arabidopsis, like mouse, has multiple DNA methyltransferase specificities likely resulting from the activity of multiple genes.
Another open question is the function of eukaryotic DNA methylation. In bacteria, most Type II DNA methyltransferases prevent restriction of host DNA by companion endonucleases, together providing a defense against exogenous DNA (Wilson and Murray 1991). However, in eukaryotes, where DNA methyltransferases are constitutive and lack restriction endonuclease companions, function is the subject of controversy. Methylation-dependent inhibition of gene function in many cases is consistent with various hypothesized roles. Genome defense is a likely function for methylation in fungi, where the introduction of sequence duplications leads to methylation of both copies (Rossignol and Faugeron 1995; Russoet al. 1996), and in maize, where methylation of transposons keeps them inactive (Federoffet al. 1995). Methylation in animals and plants might permit the evolution of large genomes (Bestor 1990), perhaps by reducing transcriptional noise (Bird 1995). Developmental roles for methylation are suggested by methylation of imprinted alleles (Shemer and Razin 1996) and of the inactivated X chromosome (Riggs and Pfeifer 1992) in mammals. In Arabidopsis, the observation that experimentally induced hypomethylation (Finneganet al. 1996; Kakutaniet al. 1996; Ronemuset al. 1996) causes dense de novo methylation and epimutations (Jacobsen and Meyerowitz 1997) suggests a role for methylation in physiological adaptation. Such different roles for methylation are not mutually exclusive: endogenous gene silencing and physiological adaptation might have evolved from a methylation-based genome defense system against transposons (Chandler and Walbot 1986). Added interest in these hypothesized roles for methylation comes from evidence that methylation is causally involved in mutation and cancer (Laird and Jaenisch 1996; Yanget al. 1996) and in imprinting disorders (Bartolomeiet al. 1993) and from the potential therapeutic use of DNA methyltransferases against retroelement invaders such as HIV (Bednarik 1996).
Here we describe a novel eukaryotic DNA methyltransferase gene, which encodes a putative enzyme with a chromodomain, a protein module that is thought to mediate interactions between key chromatin proteins (Plateroet al. 1995). Remarkably, the gene is null or severely debilitated as a homozygote in several dispersed ecotypes of A. thaliana, including interruption by a new retrotransposon, Evelknievel. The presence of a potential chromatin targeting domain, the restricted expression of the gene, and its apparent dispensability distinguish this putative DNA methyltransferase from those previously described.
MATERIALS AND METHODS
Plants: A. thaliana (2n = 10) ecotypes No-0, Col-0, Ler, Nd-0, Nd-1, RLD, Tac SI, Kl-0, Kb-0, Fi-0, Bu-20, and Be-0 are described in the AIMS/ABRC catalog (http://aims.cps.msu.edu/aims/). Metz was collected in the Golan Heights, Israel. Arabidopsis suecica (2n = 26) Sue-1, the allotetraploid hybrid between A. thaliana and C. arenosa, is described by Hanfstingl et al. (1994). Sue-2 (aka 9510) and Care-1 (C. arenosa 9509, 2n = 4x = 32) were obtained from Washington University (St. Louis), and Care-3 (2n = 4x = 32) and Care-4 were obtained from botanical gardens in France and Germany. Plants were grown in pots in growth rooms under fluorescent lights (16 hr on, 8 hr off) at 20°. Axenic seedlings were grown on moistened filter paper (3 days) or vertically on solid agar medium containing 0.5× Murashige salts (~1 wk) for roots and shoots.
Nucleic acids: Crude genomic DNA for PCR was obtained from leaves (Edwardset al. 1991) and for blot hybridization analysis by phenol extraction (Ausubelet al. 1994). mRNA was isolated from most sources using the QIAGEN (Chatsworth, CA) RNeasy Plant kit, except that the phenol-LiCl procedure of A. Hudson and M. Anderson (bionet) was used for germinated seeds, followed by an RNeasy clean-up step. mRNA was isolated using the Promega (Madison, WI) PolyA-Tract system. Double-stranded cDNA was produced from the resulting mRNA using the CLONTECH (Palo Alto, CA) cDNA amplification kit protocol (scaled down sixfold for use with poly(A) RNA from 10 μg total RNA), except that an S-200 spin column (Pharmacia, Piscataway, NJ) step followed treatment with T4 DNA ligase and dilution to 45 μl with 10 mm Tricine. The resulting cDNA solutions were used for quantitative PCR assays and long PCR. RT-PCR primers were as follows: CMT1 (for “chromomethylase”) (exon 10-11, 283 bp spliced, 365 bp A. thaliana genomic, 385 bp Care-1 genomic) “Le” 5′-GCAACTGCAAGGAGAAGCTGAAGGAG-3′ and “Ri” 5′-TAC AGCATGACGTGCCAGGAAACCC-3′, MET1 (357 bp spliced, ~570 bp genomic) 5′-GTACTCTGCCACTGCCTGGT-3′ and 5′-GCCATTCAGGGAGAACTTCTTCTGGTGC-3′, cyclophilin (Cyclo 401 bp) 5′-CGATAAGACTCCCAGGACTGCCGAGAA-3′ and 5′-TCGGCTTTCCAGATGATGATCCAACCT-3′. 32P hybridization probes were synthesized by linear amplification from PCR products (Konatet al. 1994). Electrophoresis and hybridization analyses were done by standard procedures (Ausubelet al. 1994). A phosporimager (Molecular Dynamics) was used for 32P detection and quantification. The density in each lane was normalized to a genomic control.
PCR: Primers were predicted either automatically using Primer3 (http://www-genome.wi.mit.edu/cgi-bin/primer/primer3.cgi) or manually with the aid of Oligo (Rychlik and Rhoads 1989) and synthesized by the Hutchinson Center Biotechnology Facility (Seattle). For quantitative analysis, Ampli-Taq Gold (Perkin-Elmer, Norwalk, CT) was used with cycles of 95° (20 sec) 60–65° (1 min) and 72° (1 min), with the number of cycles and the length of the 95° preheating step adjusted depending on the experiment. Reactions were performed according to manufacturer's instructions in 10 μl vol without oil in a PTC-100 thermocycler with clamp-down lid (MJ Research, Watertown, MA). Long-distance PCR reactions were performed in 10–20 μl volumes using Advantage Klen-Taq polymerase reagents (CLONTECH) following a cycling protocol modified from the Boehringer-Mannheim (Indianapolis) Expand kit: a 93°, 2-min preheat, 10-sec, 93° denaturations, 65°, 30-sec annealings, and 68°, 3-min extensions + 10 sec/cycle incremental increases after cycle 10 for a total of 35–45 cycles.
Sequence analysis: Genomic and cDNA sequences were determined from long-distance PCR products using a cycle sequencing protocol (Perkin-Elmer). Gel analysis was performed by the Hutchinson Center Biotechnology Facility using ABI Model 373 sequencers. The resulting files from multiple runs were assembled semimanually and verified by alignment with the complementary strand and/or by agreement with sequence from other ecotypes. Putative polymorphisms were generally confirmed by reamplification to detect PCR misincorporations, although none were found in the ~70 kb of complete sequence determined in this study. These sequence data have been submitted to the GenBank databases under accession number AF039364–AF039376.
Homology searching was performed using the BLAST (http://www.ncbi.nlm.nih.gov) and Blocks (http://www.blocks.fhcrc.org) servers. Blockmaker output obtained from the Blocks WWW server was used to construct protein family trees (Henikoffet al. 1997). Trees from nucleotide substitution data were constructed using the Phylip Neighbor program (http://evolution.genetics.washington.edu/phylip.html).
RESULTS
Detection of a chromodomain within a predicted DNA methyltransferase: To uncover previously unrecognized chromodomain-containing proteins, the automated COBBLER system for database mining (Henikoff and Henikoff 1997) was applied to the family of chromodomains: Consensus residues from the single chromodomain block in the automatically generated Blocks Database were embedded into the 60 aa N-terminal chromodomain-containing segment of the closest member sequence (MOD1-MOUSE). This altered sequence was used to search the nonredundant (nr) protein sequence database with BLAST (Altschulet al. 1990), yielding hits to previously documented chromodomain proteins, plus a hit (P = 0.0067) to a protein predicted from a cosmid derived from A. thaliana (GenBank accession number U53501), which was annotated as a cytosine DNA methyltransferase homolog. Chromo-domain homology was confirmed by using the Arabidopsis coding sequence to search the nr protein sequence database with BLAST, and both the Blocks and Prints multiple alignment databases with the Blocksearch system (Henikoff and Henikoff 1994) (Figure 1A). The results list from the BLAST search detected significant matches to most previously documented chromodomains; the strongest hit was within a protein of unknown function annotated as containing a chromodomain [P = 5.2 × 10−8 (PIR accession number S23571)] encoded by a fungal transposon. In a search of the Blocks Database, the single chromodomain block was detected at the 99.92 percentile level (Henikoff and Henikoff 1994), and in the search of the Prints Database, which divides the chromodomain into three blocks, all three were detected among the top hits. Of the Prints blocks, Block C was detected at the 98th percentile, and this was supported by independent detection of Blocks A and B at P = 1.3 × 10−9. We conclude that the predicted Arabidopsis protein contains a bona fide chromodomain.
The GenBank annotation of the potential Arabidopsis protein as a cytosine DNA methyltransferase homolog was based on an open reading frame (ORF) from a hypothetical spliced mRNA. A search of the Blocks Database using the entire cosmid sequence translated in six frames identified the two conserved C-terminal blocks, which were not included in the predicted protein (data not shown). Together with the four N-terminal blocks, it appeared that the entire coding sequence for the catalytic region of a potential cytosine methyltransferase is present on the cosmid. This was confirmed experimentally as described below. It is especially interesting that the chromodomain lies within an ~80 aa insertion between the first two conserved catalytic blocks (Figure 1A). Based on sequence alignment with the known structures of DNA-bound cytosine methyltransferases (Klimasauskaset al. 1994; Reinischet al. 1995) the chromodomain should lie along a face of the catalytic domain that is nearly perpendicular to the DNA substrate (bracket, Figure 1B).
A phylogenetic tree based on block alignments places the A. thaliana chromomethylase along the higher eukaryotic branch weakly separated from bacterial enzymes but clearly separated from the A. thaliana MET1 gene product and other higher eukaryotic enzymes (Figure 1C). Evidently, the chromomethylase diverged from other known DNA methyltransferases prior to the divergence of plants and animals. In addition, the chromomethylase lacks the extremely long N-terminal extension found for all other higher eukaryotic DNA methyltransferases.
The chromomethylase is the second DNA methyltransferase identified in A. thaliana. Previous assertions of a second DNA methyltransferase in A. thaliana (Scheidtet al. 1994a; Nebendahl and Baumlein 1995) appear spurious (Scheidtet al. 1994b), and we are unable to detect similarity of that sequence to any DNA methyltransferase in database searches. In stark contrast, the chromomethylase sequence detects the known cytosine DNA methyltransferases with BLAST P-values as low as 10−45.
Alignment analysis of the chromomethylase protein sequence. (A) The COBBLER-embedded chromodomain was used as query to search the nr sequence database with Blast. A P-value of 0.0067 was obtained for a protein predicted from GenBank entry U53501. Significance of the hit was confirmed by searching the experimentally determined protein sequence versus the nr database and the Blocks and Prints protein family databases. The P-value of the Blast hit shown is for the env homolog (GenBank accession number L34658) encoded by Cladosporium fulvum transposon CfT-1, which detects the Drosophila melanogaster HP1 chromodomain at P = 1.8 × 10−7 in Blast searches. Rectangles delimit the extent of alignment of chromomethylase segments with multiple alignment representatives. In the map, the chromodomain (chr) is indicated relative to the positions of the A–F methylase blocks from the Blocks Database; these correspond to Motifs I, IV, VI, VIII, IX, and X (Posfaiet al. 1989), respectively, and are recognized as the only fully conserved C5 DNA methyltransferase motifs (Chenget al. 1993). (B) RasMol-generated image (Roger Sayle, Glaxo) of Protein Database ID 1dct showing a 3 aa segment (darker shading, indicated by the bracket) between conserved blocks. This is the expected position of the CMT1 chromodomain insertion when conserved segments are fit to the HaeIII structure. (C) Phylogenetic tree of eukaryotic cytosine DNA methyltransferases. All bacterial members of PROSITE family PS00094 were also included in protein multiple alignment and tree construction but were removed for clarity, except for M. HgiDII, the closest single sequence, and M. HhaI and M. HaeIII, whose structures are known. Bootstrap resampling percentages are shown for selected nodes.
RT-PCR blot analysis of CMT1 and MET1 transcription during development. Double-stranded cDNA was synthesized using poly(A)+ RNA extracted from cultured or dissected plant tissues. Aliquots for gel electrophoresis were calculated to contain ~3 ng cDNA (or genomic DNA as indicated in ng) subjected to PCR amplification for 20 cycles following an initial 8 min 95° step.
CMT1 mRNA is preferentially expressed in flowers and is ~100-fold less abundant than MET1 mRNA: In an attempt to determine the chromomethylase coding sequence, we assayed for cDNAs in available A. thaliana clone libraries using the PCR. We readily detected the expected amplified product from the MET1 gene but failed to detect a product with primers specific for the chromomethylase gene (CMT1) (data not shown). Guided by the predicted ORF, primers were synthesized in attempts to amplify cDNAs from poly(A)+ RNAs obtained from various Arabidopsis stage and tissue sources. RT-PCR was performed using oligo(dT) to prime MuLV reverse transcription for synthesis of double-stranded cDNA. A profile of RT-PCR products (Figure 2) shows the levels of CMT1, MET1, and cyclophilin control cDNAs in preparations from several sources. Based on comparison to the levels of genomic DNA detected in control reactions, estimates can be made of the relative levels of cDNA for each sample. For example, in flowers from plants of the Kl-0 ecotype, spliced MET1 cDNA is eightfold less abundant than cyclophilin cDNA, and spliced CMT1 cDNA is 750-fold less abundant than cyclophilin cDNA. In roots from Col-0 plants, MET1 cDNA is 40-fold less abundant than cyclophilin cDNA, and no CMT1 cDNA is detected above what is seen in controls. Comparable results for MET1 and CMT1 were obtained using specific primers for thermostable reverse transcriptase synthesis of first-strand cDNA for RT-PCR (data not shown). Taken together, these data show that CMT1 mRNA is present in inflorescences and is at a much lower level or nonexistent in leaves, roots, growing seedlings, and plants prior to formation of flower buds. CMT1 inflorescence expression differs from that of MET1, which appears to be uniformly transcribed in meristematic tissues, though at lower levels in maturing leaves, which grow by cell enlargement.
An estimate of CMT1 mRNA levels in budsand flowers can be made by comparison to cyclophilin controls. Cyclophilin cDNAs have been recovered from leaf libraries at a level of 1/10,000 of total mRNA (Lippuneret al. 1994) and show roughly uniform levels of expression in our samples. This leads to an estimate of ~10−7 of total mRNA for CMT1 in floral tissues. We also examined mRNA levels in C. arenosa, a relative of A. thaliana, and found about threefold higher levels of CMT1 expression in C. arenosa buds and flowers than in A. thaliana, and detectable levels in leaves (Figure 2).
Arabidopsis CMT1 gene organization in ecotypes. (A) Coding exons are shown as open boxes, conserved blocks as filled boxes designated A–F for the methylase blocks and chr for the chromodomain, LTRs for the Evelknievel transposon insertion as half arrows, the copia-like ORF as the long box, and RT-PCR primers used in Figure 2 are shown as opposing arrows. Below, vertical bars show nucleotide differences from Nd-0 for each of the other ecotypes. Amino acid differences from Nd-0 are shown adjacent to or above corresponding bars. (B) Tree based on nucleotide sequence data shown in (A), using C. arenosa as outgroup to root the tree. The Evelknievel insertion polymorphism was ignored in deriving the tree; ecotypes with this insertion (in bold italic) form a node with 76% bootstrap resampling support.
The extraordinarily low abundance of CMT1 mRNA and the small amounts of floral tissue in A. thaliana plants made it impractical to obtain full-length cDNA using standard protocols, such as cloning or RACE methods. Therefore, PCR primers were synthesized based on putative ORF regions using both gene prediction programs and manual examination. We succeeded in amplifying products from several flower and bud cDNA pools encompassing all conserved methylase blocks and the likely start and stop codons. Sequencing of whole PCR products demonstrated that the chromomethylase from buds and flowers is a 791 aa protein encoded on 20 exons (Figure 3A). This appears to be the full-length protein: there are no plausible splice acceptor sites upstream of the strongly predicted AUG start codon that could allow for in-frame extension of coding region, and the UGA stop codon is in a position consistent with the C terminus of all known cytosine DNA methyltransferases.
CMT1 shows extreme linkage disequilibrium: We determined ~4500 bp of genomic sequence from each of eight A. thaliana ecotypes, from the related outcrossing species C. arenosa, and from the allotetraploid hybrid A. suecica. The Arabidopsis ecotypes could be divided into two basic haplotypes, illustrated by a tree (Figure 3B). Col-0 and Metz belong to one haplotype, and the other six ecotypes and A. suecica Sue-2 belong to a second haplotype. Between haplotypes, the level of polymorphism is ~1% (Figure 3A), about 10-fold higher than expected based on the level seen for other A. thaliana genes (Hanfstinglet al. 1994). However, within the common haplotype, the level of polymorphism averages 0.1%. Partial sequencing of five other ecotypes and A. suecica Sue-1 show that all belong to the common haplotype (data not shown).
Extreme linkage disequilibrium leading to a haplotype split was previously reported for the Arabidopsis ADH gene (Hanfstinglet al. 1994). In that study, the typing of numerous loci indicated that the split is not genome-wide, as ecotypes belonging to one haplotype at ADH showed no tendency to share a polymorphism at other sites in the genome. Our results confirm and extend this finding, because the two haplotypes we detect at CMT1 do not correspond to those seen at ADH, which lies about 40 cM away from CMT1 on chromosome 1 (http://weeds.mgh.harvard.edu/goodman/sequence.html). For instance, Col-0 and No-0 belong to the same haplotype at ADH but belong to different haplotypes at CMT1.
CMT1 protein is truncated in several ecotypes: Molecular characterization of CMT1 genomic and cDNA sequences from different ecotypes revealed a bizarre situation: at least four of 13 A. thaliana ecotypes surveyed are evidently null for intact protein, and another expresses mostly aberrantly processed mRNA. A G-to-T base substitution in the Metz coding sequence introduces a stop codon that terminates translation upstream of five of the six conserved methylase blocks (Figure 3A). In the Col-0/Metz haplotype, there is an A-to-G base substitution that introduces a splice acceptor site 8 bp upstream of the normal site, which is used 50% of the time in Col-0 (Figure 4), resulting in a truncated protein lacking the downstream catalytic blocks. cDNA analysis also reveals the existence of at least one other alternatively processed form: skipping of exon 9 in ~1/2 of the mRNAs results in a truncated protein lacking nearly the entire catalytic domain (data not shown). Thus it appears that no more than about 1/4 of Col-0 mRNAs can encode active protein, whereas only the correctly processed form was detected in the sequencing of cDNA from Kl-0, a representative of the common haplotype.
Alternative mRNA processing of CMT1. (A) Position of the Evelknievel insertion in Exon 13 of No-0, Ler, and RLD, showing the GGCTG host duplication and the new splice donor site used in splicing out of virtually the entire transposon. Also shown is the position of the alternative splice in Col-0 at the polymorphic AG, 8 bp upstream of the Exon 14 splice acceptor site. In both cases, the coding sequence is truncated in about the same position of the protein. (B) Sequence trace demonstrating alternative splicing in Col-0. Using an Exon 12 primer (5′-AAAGTTCAACACACCTAAAGA ATTCAAAGCA-3′) and Col-0 PCR-generated cDNA, the presence of the two spliced forms is seen beginning with (arrow) a superimposed G (the first base of Exon 14) and T (8 bp upstream of Exon 14). Farther down the sequence, the echoed cluster of five successive G's 8 bp apart (brackets) indicates equal levels of the two transcripts.
The other three ecotypes with defective CMT1 genes (No-0, Ler, and RLD) harbor a complete retrotransposon within exon 13, detected as a 4.7-kb genomic insertion, which we call “Evelknievel” (Figure 5). The presence of the retrotransposon appears to have no effect on transcription and splicing upstream, as levels of inflorescence mRNAs assayed using upstream primers are similar to levels found for other ecotypes (Figure 2). Full-length cDNAs of about the expected size have been amplified from No-0; sequencing reveals that the penultimate base of the 3′ long terminal repeat (LTR) is a splice donor site, with the last base of the transposon splicing to the correct splice acceptor site of exon 14 (Figure 4A). As a result, the entire transposon (except for the first base of the 3′ LTR) is spliced out. The reading frame is shifted, resulting in synthesis of a truncated protein. We note that the frameshift at this splice acceptor site caused by transposon insertion is nearly identical to that produced by alternative processing from the aberrant splice site in Col-0. The absence of downstream methylase conserved segments that form part of the catalytic domain (Klimasauskaset al. 1994; Reinischet al. 1995) is expected to inactivate the truncated proteins in these ecotypes.
The Evelknievel retrotransposon is intact but is currently methylated and inactive: Sequencing of the Evelknievel insertion in No-0, Ler, and RLD shows that it is intact, with perfect 119-bp LTRs, a 5-bp host sequence duplication, and a 1451 aa ORF distantly related to copia-like ORFs. Its closest known relative is a 2088 aa A. thaliana retrotransposon ORF (GenBank accession number Z97342) that aligns with 60% identity and 9 gaps over nearly the full length of Evelknievel.
To determine the genomic distribution of Evelknievel in A. thaliana, we attempted to amplify Evelknievel sequence from 13 ecotypes but detected product only in No-0, Ler, and RLD (data not shown). We also detected Evelknievel product in both A. suecica and in the genome of one of three C. arenosa isolates, Care-4, in which CMT1 is intact (data not shown). Amplification and sequencing of a 4400-bp PCR product from Care-4 using Evelknievel primers identified an intact ORF that is identical to the A. thaliana Evelknievel ORF at 94.3% of aligned nucleotides with four gaps. This level of sequence similarity for an ORF between these two species is similar to that seen at the ADH locus, suggesting that Evelknievel has been vertically transmitted in A. thaliana and C. arenosa and has been maintained by active transposition.
Genomic DNA analysis of CMT1 and Evelknievel. (A) PCR analysis detects the interruption in Ler and No-0. Outer lanes show 1-kb ladder (Bethesda Research Labs, Bethesda) marker bands (from bottom: 1 kb, 1.6 kb, 2 kb, 3 kb, …). See Figure 6A for map location of primers. (B) Blot hybridization analysis of NsiI digests probed with CMT1 (left) and Evel (Evelknievel; right) probes.
Blot hybridization analysis of genomic DNAs demonstrates that CMT1 is a single-copy gene in A. thaliana (Figure 5B, left). There are evidently two copies in A. suecica, one deriving from A. thaliana and belonging to the common haplotype. The other copy is weakly hybridizing and probably derives from the C. arenosa gene: A. suecica is an allotetraploid hybrid between A. thaliana and C. arenosa (Hanfstinglet al. 1994; Kammet al. 1995) and is partially cross-fertile with both. Hybridizing this blot with an Evelknievel probe reveals that the transposon in CMT1 is the only genomic copy in Ler and RLD (Figure 5B, right). In No-0 genomic DNA digested with NsiI, which does not cleave Evelknievel, there are an additional three bands of differing intensities. These additional copies of Evelknievel are unlinked from CMT1 in No-0, because about 1/4 of the progeny of a No-0/Col-0 × Col-0/Col-0 cross lacked Evelknievel in CMT1 but retained Evelknievel copies elsewhere in the genome (data not shown). It is very surprising that an identical pattern of three Evelknievel bands is found for both A. suecica ecotypes, which lack Evelknievel in CMT1 (Figure 5B), because Southern analysis of the above backcross progeny showed independent segregation of the three bands (data not shown).
The three A. thaliana ecotypes harboring Evelknievel in CMT1 form a separate branch of the CMT1 tree (Figure 3B). Base substitution differences between these three ecotypes in the CMT1 gene indicate that polymorphisms have accumulated since the transposon insertion event. It is unlikely that the polymorphisms were introduced by recombination, both because of the very low levels of recombination in this region implicit from the extreme linkage disequilibrium and from the fact that in 3/4 of the cases, the changes are unique to an ecotype. Direct evidence for divergence since the insertion event comes from the fact that five differences between No-0, Ler, and RLD have accumulated in Evelknievel since the insertion event (Table 1). The Care-4 Evelknievel sequence reveals the identity of the ancestral base for each polymorphic change. In three cases, the change was either CG to TG or CWG to TWG, changes that may be attributable to deamination of methyl-C. In support of this possibility, we note that the probability that random mutations could account for three of five differences conforming to a methyl-C deamination pattern is only 1/500. We conclude that Evelknievel has been present in CMT1 long enough for mutations to have accumulated and has been methylated during a large fraction of this period.
The chromomethylase appears to be conserved for methylase function: Patterns of conservation can reveal functional constraints. Alignment of A. thaliana and C. arenosa genomic and cDNA sequences shows a pattern that is typical of conservation for coding sequence. Nucleotide sequence alignment of the 4.5 kb of genomic sequence shows 83% identity with 58 gaps, whereas alignment of the ~2200-bp coding sequence alone shows 88% identity with four gaps. Divergence is much greater in the introns, where alignment is often uncertain because of frequent gaps and substitutions, than in the exons, where alignment is unequivocal and ORFs are intact. Moreover, all splice sites are conserved between A. thaliana and C. arenosa, and nearly all conform to consensus spice sites: The NetPlantGene server (Hebsgaardet al. 1996) successfully predicted all but three of the 38 possible splice junctions for CMT1 in both A. thaliana and in C. arenosa (data not shown).
Alignment of amino acid sequences between the two species shows 85% identity between a C. arenosa consensus and Col-0. The conserved methylase blocks show a higher level of overall identity: 98 identities over 104 positions, which is highly significant compared to the expectation of 85% based on the whole protein [χ2 = 7 (1 d.f.), P = 0.005]. Higher conservation in methylase blocks indicates that the chromomethylase has been conserved for methylase function.
Evelknievel forms a methylated island within CMT1.Map of CMT1::Evelknievel shows restriction sites (triangles for NsiI and lollipops for HpaII sites, filled where CG-methylated and empty where unmethylated), extent of probes (CMT1 and Evel) and PCR primers used in Figure 5A [5′-GCAACTGCAAGGAGAAGCTGAAGGAG-3′ (Le) and 5′-TTGTGAACTAAAACACCAGGCATGTCGC-3′, opposing arrows]. Blot hybridization analysis is based on HpaII digests. The CMT1 probe used (left) spanned the full coding region of Col-0 genomic DNA. Absence of the downstream ~2-kb product in Ler and No-0 reveals that the HpaII site 21 bp downstream of Evelknievel (striped lollipop) is methylated, though not in Col-0, which also shows the expected 0.6-kb band. The Evel probe reveals CG-methylation blockage of all HpaII sites in all copies of Evelknievel, which migrate at limiting mobility. PCR, HpaII digest of Evel PCR product from No-0.
DISCUSSION
Chromodomains have been detected in several proteins known to be involved in chromatin structure in some way, but just how these proteins act is poorly understood. Recently, the 3-D structure of a chromodomain from mouse MOD1 was reported (Ballet al. 1997); however, as is the case for all previously reported chromodomains, it lies within a much larger protein whose structure and biochemical function remain unknown. In contrast, high-resolution structures are known for cytosine DNA methyltransferases binding to their natural DNA substrates (Klimasauskaset al. 1994; Reinischet al. 1995). The discovery of a chromodomain integral to a putative DNA methyltransferase is the first example of a chromodomain in a protein with known function and structure, thus potentially setting the stage for a detailed biochemical understanding of how a chromodomain functions in a natural context. The in vivo behavior of a chimeric protein consisting of the chromodomain from Drosophila Polycomb protein in the context of Heterochromatin Protein 1 suggests that the domain mediates interactions between specific chromatin components (Plateroet al. 1995). A comparable function in the chromomethylase might be to guide the enzyme to its genomic target. However, the nature of this target remains to be elucidated.
Our study depended on the presence in GenBank of a cosmid sequence that spanned the CMT1 gene, which was determined as part of an Arabidopsis genome sequencing effort (G. Church, personal communication). Owing to its low-level expression, CMT1 cDNA is absent from popular cDNA libraries and from the set of ~30,000 A. thaliana partial cDNAs in databanks. Chromomethylase homologs may exist in vertebrates based on divergence from other higher eukaryotic DNA methyltransferases prior to the separation of plants and animals. Such homologs might be absent from vertebrate cDNA libraries because of low abundance. It it doubtful that a chromomethylase gene could have been detected in large vertebrate genomes by hybridization or PCR methods based on other eukaryotic DNA methyltransferases because sequence divergence is relatively high. The CMT1 gene should provide better probes and PCR primers. Nevertheless, it may be that only genome sequencing will uncover a CMT1 homolog in vertebrates. This might be the case for a large fraction of higher eukaryotic genes: any gene that is of exceptionally low abundance and is entirely nonessential is practically invisible to methods other than genomic sequencing and may be difficult to detect even then.
CMT1 is dispensable in A. thaliana but is conserved between species: Previously characterized DNA methyltransferases from higher eukaryotes are expressed almost ubiquitously, and it has been speculated that they play housekeeping roles, such as maintenance of methylation patterns (Holliday 1996). Housekeeping functions are consistent with findings of pleiotropic effects in methylase-deficient individuals, as mouse gene knockout mutations are embryonic lethal (Leiet al. 1996). In contrast, CMT1 expression is confined to inflorescences, the first example of tissue-specific expression of a DNA methyltransferase gene. One possibility is that CMT1 mediates condensation of chromatin in generative nuclei by hypermethylation (Mascarenhas 1975; Oakeley and Jost 1996). However, CMT1 is dispensable in A. thaliana wild populations. In this species, homozygosity is the rule owing to the preponderance of self-fertilization, which is estimated to exceed 99% of fertilization events. Dead or partially defective CMT1 loci are found in the most popular ecotypes used in laboratory research, including Ler, No-0, RLD, and Col-0. No obvious common trait distinguishes ecotypes with CMT1 defective from ecotypes with CMT1 intact. For instance, Metz flowers late, whereas Ler, No-0, and RLD flower early. Therefore CMT1 deficiency is unlikely to affect plant development or physiology.
Mutations in Evelknievel-containing CMT1 alleles and Care-4
An occasional null mutation in a nonessential gene is by no means unique in wild populations. However, such deleterious mutations are not expected to persist. Persistence of the CMT1::Evelknievel allele is remarkable, because the insertion event must have occurred prior to the dispersal of at least three otherwise unrelated ecotypes and accumulation of several base substitutions. Such persistence may result from positive selection. One possibility is that mutations in CMT1 alleviate the known detrimental effects of cytosine DNA methyltransferases, which can lead to C-to-T mutations (Laird and Jaenisch 1996; Yanget al. 1996). It is also possible that truncation results in a folded protein that has a new function, perhaps one that depends on the chromodomain. A nearly identical truncated protein is encoded by one of the alternatively processed forms in Col-0, caused by a mutation that occurredprior to the dispersal of Col-0 and Metz. Positive selection for these truncated proteins may account for persistence of these aberrant alleles. We note that a similarly truncated protein exists in bacteria: the AquI cytosine DNA methyltransferase is encoded in two parts from overlapping ORFs on a single operon (Karreman and de Waard 1990). The N-terminal portion contains the same four conserved regions as are found in the truncated chromomethylase. Although both parts of the protein are inactive for methylase activity when separately expressed, mixtures of extracts are active.
Another possibility is that CMT1 is an inactive pseudogene in A. thaliana. There is precedence for this in Schizosaccharomyces pombe, which encodes a defective DNA cytosine methyltransferase homolog (Wilkinsonet al. 1995). In that case, the defect is obvious: the invariant Pro-Cys active site motif is Pro-Ser-Cys in the S. pombe sequence, and removal of the Ser results in a cytosine methyltransferase with CCmeWGG activity (Pinarbasiet al. 1996). It may be that CMT1 is also defective; however, there is no obvious flaw in the conserved regions of the chromomethylase. Direct activity measurements are needed to rule out this possibility, but such measurements have thus far been impeded by the lack of solubility or expression in heterologous hosts (C. McCallum, personal communication) and by the high excess of MET1 (assuming that MET1 is an active DNA methyltransferase, which has yet to be demonstrated in vitro) or related DNA methyltransferases in planta. Nevertheless, conservation of CMT1 indicates that the gene has evolved to encode an active methylase: An ~800 aa ORF and 38 splice junctions have been maintained since the divergence between Arabidopsis and Cardaminopsis. Moreover, the cytosine DNA methyltransferase blocks have been conserved at a significantly higher level than has the protein as a whole, as if methyltransferase function has been selected for since these species diverged from a common ancestor. So although CMT1 may be nonfunctional in present-day A. thaliana, its sequence conservation implies ancestral functionality.
Yet another possibility is that CMT1 function is redundant with that of another DNA methyltransferase. No chromomethylase similar enough to CMT1 was detected by hybridization or PCR, although a more distantly related chromomethylase would have escaped detection. Redundancy of MET1 has been suggested by the existence of a closely related MET1 homolog in A. thaliana (GenBank accession number Z97335). Furthermore, MET1 antisense lines show striking de novo methylation implicated in silencing of the SUPERMAN locus (Jacobsen and Meyerowitz 1997). This result supports the possibility that DNA methyltransferases other than MET1 and its close relatives are responsible for de novo methylation.
If CMT1 encodes an active DNA methyltransferase, then perhaps it supplements or backs up MET1 or other constitutive DNA methyltransferases. Unless CMT1 expression can be induced under certain unknown conditions, a supplementing role seems unlikely, because even in flowers, levels of CMT1 mRNA are ~100-fold lower than those of MET1. The chromomethylase is also unlikely to be the yet uncharacterized CWG maintenance DNA methyltransferase (Adamset al. 1996), because these sites are methylated in homozygous CMT1::Evelknievel ecotypes.
A role in genome defense? An alternative explanation for apparent dispensability is that CMT1 is useful on occasion, but such occasions may not occur during the lifetime of the average plant. For example, A. thaliana ADH is a nonessential gene that is thought to be subject to balancing selection for two haplotypes (Hanfstinglet al. 1994). Wild populations of plants consist of mixtures of ecotypes, which are equally divided between these two haplotypes species-wide. Occasional selective events for either haplotype are thought to be sufficient to maintain both forms over long evolutionary periods. Likewise, the persistence of active and truncated forms of CMT1 might have been similarly maintained by occasional selective events favoring either form, and alternative processing seen in Col-0 would allow for both forms to coexist in a homozygote. In the case of ADH, the selective event is undoubtedly environmental, given the known role of the enzyme in protection from anoxia. In the case of CMT1, genome-wide mutagenicity might be subject to selection, with alternative alleles in balance because of alternative types of mutation: C-to-T transitions increase when CMT1 is wild type and transposon insertions increase when CMT1 is defective.
Transposons are relatively rare in A. thaliana: a PCR-based search detected 10 families of copia-like retrotransposons, most present as just a single copy per family, and even these may be defective (Koniecznyet al. 1991). Arabidopsis appears to have an active defense system against retrotransposon expression, because when anactive tobacco copia-like element was introduced into A. thaliana, no transcription or transposition could be detected from integrated copies (Lucaset al. 1995). Prevention of retrotransposition is thought to result from silencing of expression by methylation, which is the fate of gene duplications (Bender and Fink 1995) and transgene repeats (Mittelsten Scheid 1994) in the A. thaliana genome. An effective genome defense against retrotransposons can help account for the small size of the Arabidopsis genome, the smallest of any higher plant. In contrast, 50% of the maize genome consists of retrotransposons clustered between genes (SanMiguelet al. 1996). The extraordinary rarity of CMT1 mRNAs might reflect restricted expression of the gene within inflorescences, consistent with a function during sexual reproduction. Such restricted expression of CMT1 may be sufficient to prevent retrotransposition, because in maize, active copia-like retrotransposons are expressed exclusively in generative cell precursors (Turcichet al. 1996). Thus, retrotransposons and other parasitic DNA elements may be targets for chromomethylase action.
A. thaliana may have evolved from an obligate outcrossing species similar to C. arenosa, which is self-incompatible and has showy and scented flowers and larger nectaries. Transposons would be better able to spread to new genomes in an outcrosser than in a selfer, and so CMT1 might be under stronger positive selection and be more highly expressed in C. arenosa. Moreover, in an outcrosser, a null mutation in a genome defense gene might have little consequence, because heterozygosity assures that there will be an active copy of the gene. For a selfer, the potential mutational burden of a cytosine DNA methyltransferase might counterbalance the occasional need for genome defense. This balance might be affected by reduced transposon numbers in A. thaliana, which could further reduce selection for intact CMT1.
Like all proposed roles for eukaryotic DNA methyltransferases (Martienssen and Richards 1995; Holliday 1996; Laird and Jaenisch 1996; Yoderet al. 1997), a role for the chromomethylase in genome defense must be considered speculative. Nevertheless, the novel features of the chromomethylase and the availability of naturally truncated alleles in an experimentally tractable species provide an unparalleled opportunity to understand how DNA methyltransferases can benefit an organism.
Acknowledgments
We thank Amy Csink and Claire McCallum for insightful discussions during the course of this work, and Rachel Holmes-Davis for DNA. This work was supported by a grant to S.H. from the National Institutes of Health (GM-29009).
Footnotes
-
Communicating editor: V. Sundaresan
- Received October 31, 1997.
- Accepted December 29, 1997.
- Copyright © 1998 by the Genetics Society of America