Eleven sequenced BACs were annotated and localized via FISH to tomato pachytene chromosomes providing the first global insights into the compositional differences of euchromatin and pericentromeric heterochromatin in this model dicot species. The results indicate that tomato euchromatin has a gene density (6.7 kb/gene) similar to that of Arabidopsis and rice. Thus, while the euchromatin comprises only 25% of the tomato nuclear DNA, it is sufficient to account for ∼90% of the estimated 38,000 nontransposon genes that compose the tomato genome. Moreover, euchromatic BACs were largely devoid of transposons or other repetitive elements. In contrast, BACs assigned to the pericentromeric heterochromatin had a gene density 10–100 times lower than that of the euchromatin and are heavily populated by retrotransposons preferential to the heterochromatin—the most abundant transposons belonging to the Jinling Ty3/gypsy-like retrotransposon family. Jinling elements are highly methylated and rarely transcribed. Nonetheless, they have spread throughout the pericentromeric heterochromatin in tomato and wild tomato species fairly recently—well after tomato diverged from potato and other related solanaceous species. The implications of these findings on evolution and on sequencing the genomes of tomato and other solanaceous species are discussed.
MANY plant and animal species possess chromosomes differentiated into highly condensed, pericentromeric heterochromatin and largely decondensed euchromatic arms. In animals, heterochromatin is often associated with transcriptional inactivity and suppressed genetic recombination and contains a large number of repetitive sequences (Dean and Schmidt 1995; Renauld and Gasser 1997; Myster et al. 2004). In plants, more is known about the gene-rich euchromatin, but less about heterochromatin (De Jong 1998; Cheng and Murata 2003; Koornneef et al. 2003). Studies thus far suggest that plant heterochromatin, while gene poor compared with euchromatin, still contains transcriptionally active genes, at least at certain times during the life cycle (Martin et al. 1993; Li et al. 2004; Lippman et al. 2004; Martienssen et al. 2004; Nagaki et al. 2004).
Advances in the genomic sequencing of model plants and animals have provided great insight into the composition and organization of genes in euchromatin as well as the structure and function of the heterochromatin. The Ty3/Gypsy class retrotransposons and their relatives have been found in pericentromeric regions, especially in grasses (Cheng et al. 2002; Jiang et al. 2003; Mroczek and Dawe 2003; Wu et al. 2004). Tandem repeats and retroelements are also keys for centromere recognition by kinetochore proteins. Repeats in the pericentromeric regions also play important roles in sufficiently initiating the recruitment of histone modification enzymes and promoting the formation and maintenance of heterochromatin by RNAi machinery (Cheng et al. 2002; Hall et al. 2002; Volpe et al. 2002; Zhong et al. 2002; Bender 2004; Sun et al. 2004). This still leaves open several important questions: What are the global differences between euchromatin and heterochromatin? Which of these differences are consistent among species and which are species specific? What evolutionary forces have molded the composition and function of heterochromatin vs. euchromatin? In hopes of shedding light on these questions, we have conducted a series of molecular cytogenetic experiments to characterize the nature of heterochromatin vs. euchromatin in the model crop plant—tomato.
The tomato (Solanum lycopersicum) is a diploid species with a genome composed of 12 chromosomes (2n = 2x = 24) totaling 950 Mb of DNA (Arumuganathan et al. 1991). Along with maize, it was an early model system for genetics and cytogenetics studies in plants (for review, see Rick 1971). Like many other plant species, such as Arabidopsis and Medicago truncatula, tomato chromosomes contain long, contiguous stretches of euchromatin at the distal ends of most chromosomes and heterochromatic regions flanking the centromeres (De Jong 1998; Kulikova et al. 2001; Fransz et al. 2003). Approximately 25% of the tomato genome is contained in the euchromatin and 75% is contained in the pericentromeric heterochromatin (Peterson et al. 1998). Although much work has been done on Arabidopsis, which has relatively little heterochromatin, tomato would be more similar to the majority of plants, with less known on the global organization of euchromatin and heterochromatin, especially crop plants that have large genomes and more heterochromatin. We report herein the annotation of 11 sequenced BAC clones assigned, via in situ hybridization, to both euchromatin and heterochromatin. From a comparative analysis of these BACs emerges a general picture of the global organization and evolution of euchromatin vs. heterochromatin in this model dicot plant species.
MATERIALS AND METHODS
BAC clones and annotation:
Eleven sequenced tomato nuclear BAC clones were included in the analyses described herein. The sequences of all BACs have been submitted to GenBank (see Table 1 for accession numbers). Three BACs (181K1, 181O9, and 181C9) were randomly selected from a tomato BAC library, whereas the other 8 were isolated from the same library by virtue of screening with known genes or single-copy probes (Budiman et al. 2000; van der Hoeven et al. 2002; Y. Wang, unpublished data). All BACs were subjected to annotation according to guidelines utilized for the rice and Arabidopsis genome (Mao et al. 2001; Goff et al. 2002; Yu et al. 2002; International Rice Genome Sequencing Project 2005). Genes were predicted with four computational gene-finder programs: FGENESH (Salamov and Solovyev 2000), GenemarkHMM (Borodovsky and McIninch 1993), Genscan+ (Burge and Karlin 1997; http://genes.mit.edu/GENSCAN.html), and GlimmerM (Salzberg et al. 1998), using an Arabidopsis training data set. In addition, a BAC segment was also considered to represent a coding region if it showed a significant match with the Arabidopsis proteome (E-value <10−10 for tblastx) and/or plant ESTs (E-value <10−10 for blastn). Genes thus identified, but that showed strong homology to transposon-related genes (e.g., reverse transcriptase), were not considered to be part of the tomato gene repertoire.
Tomato BAC library screening:
High-density BAC filters were prepared using a tomato HindIII BAC library with 129,024 BAC clones (Budiman et al. 2000). These filters were screened with probes randomly labeled with 32P-dCTP as described previously (Feinberg and Vogelstein 1983). Hybridization was carried out overnight in plastic boxes at 65°. Filters were washed at 65° progressively with 2× SSC + 0.1% SDS for 30 min, 1× SSC + 0.1% SDS for 20 min, and 0.5× SSC + 0.1% SDS for 10 min. Phosphorimaging screens were exposed overnight and were scanned on a Storm PhosphorImager (Molecular Dynamics, Sunnyvale, CA).
Identifying highly repetitive sequences in BACs:
The 11 sequenced BAC clones were computationally screened for repetitive elements in two ways. First, each was compared against two repeat databases: Solanaceae repeat database, which contains all known repeats from solanaceous species, and Repbase, which is a comprehensive database of repetitive element consensus sequences in eukaryotic genomes, including transposable elements (TEs) and tandem repeats of diverse origins (Jurka 2000; Ouyang and Buell 2004). If the gene model was homologous with known TEs (TIGR_Sol_repeat, or Repbase repeats, or Arabidopsis TEs), then it was annotated as a TE-related gene. Second, the BACs were computationally screened against each other (blastn; E-value <10−20 and bits score >100) in an effort to identify repetitive sequence motifs shared by two or more BACs. Segments of BACs containing putative repetitive elements were aligned and manually examined in an effort to determine the length and boundaries of each repetitive element.
A subset of the repetitive elements identified in this study was screened against the genomes of other solanaceous species to determine the taxonomic distribution of these elements. For these studies, PCR-amplified repeats were amplified, labeled with 32P, and used as probes on the genomic Southern blots containing restriction-digested genomic DNA from S. pimpinellifolium L. (LA1589), S. chmielewskii (C. M. Rick, E. Kesicki, J. Fobes, M. Holle, D. M. Spooner, G. J. Anderson, and R. K. Jansen) (LA1316), S. peruvianum L. (LA1708), S. chilense (Dunal) Reiche (LA1959), S. pennellii Correll (TA56), S. neorickii (D. M. Spooner, G. J. Anderson, and R. K. Jansen) (LA2133), S. habrochaites (S. Knapp and D. M. Spooner) (LA1777), S . lycopersicum L. (TA209), S. tuberosum L., Capsicum annuum L. (garden pepper), Petunia × hybrida hort. ex E. Vilm., and S. melongena L. (eggplant) (Spooner et al. 2005). Blots were hybridized at 60° and washed in 2× SSC for 20 min and in 1× SSC for 10 min. The tomato 45S rDNA (R45S) ribosomal gene probe was used as a positive control for estimating the relative strength of hybridization signals (Ganal et al. 1988).
For estimating the methylation status of cytosine in the retrotransposons, a Southern blot was made using tomato (TA56 and TA209) genomic DNA digested with methylation-sensitive restriction enzymes, HpaII and EcoRI, and methylation insensitive isoschizomers, MspI and BstNI (Messeguer et al. 1991). The LTR and polyprotein regions of the retrotransposon were labeled with 32P-dCTP and used as probes on the genomic Southern blots. Since cytosines in the tomato chloroplast are not methylated, PCR fragments of tomato chloroplast genes (AY216521, AF397080, and AF263101) were used as the methylation negative control (Fojtova et al. 2001).
DNA fragments were amplified from each BAC clone using primers designed from annotated exons or BAC ends and then mapped as cleaved amplified polymorphic sequence or RFLP markers onto the high-density tomato map on the basis of a population of 80 F2 individuals from the cross S. lycopersicum LA925 × S. pennellii LA716 (Fulton et al. 2002; http://www.sgn.cornell.edu/cgi-bin/mapviewer) (supplemental Table 1 at http://www.genetics.org/supplemental/).
Copy number reconstruction experiments:
To estimate the copy numbers of the tomato repetitive sequences, PCR products representing the repeats were used in the reconstruction experiments. Denatured tomato genomic DNA extracted from S. lycopersicum cv. TA209 (400 and 1000 ng) was spotted onto Hybond N+ membrane (Amersham, Arlington Heights, IL) according to Ganal et al. (1988). Different quantities of PCR product, representing different copy numbers of the repeat in the tomato genome, were also spotted on the same membrane. Hybridization of the 32P-dCTP-labeled repeats, analysis of the autoradiographs, and estimation of the copy number, from a plot of densities vs. copy numbers, were carried out as described by Bernatzky and Tanksley (1986). The mean results of three independent reconstruction estimates were reported for each repeat.
Pachytene chromosome preparation and fluorescence in situ hybridization:
Immature tomato flower buds, ∼3 mm in length, were harvested and fixed in Carnoy's solution (3:1 ethanol:glacial acetic). Anthers containing pachytene stage microsporocytes were squashed in acetocarmine solution. Slides were frozen in liquid nitrogen, with coverslips removed, dehydrated through a series of ethanol washes (70, 90, and 100%), and subjected to fluorescence in situ hybridization (FISH) using the method of Jiang et al. (1995). DNA of individual BACs was isolated using a standard alkaline extraction and labeled by nick translation with digoxigenin-16-dUTP (Roche Diagnostics, Indianapolis). Slides with meiotic chromosomes were denaturated at 80° for 2 min in a buffer containing 70% formamide in 2× SSC and immediately placed in a series of precooled ethanol baths (70, 90, and 100%, for 5 min each). Denaturated probe mixture (20 μl containing 50–100 ng labeled probe, 2× SSC, 50% deionized formamide, and 10% dextran sulfate) was applied to each slide and covered with a coverslip. Hybridization was carried out at 37° overnight in a humid chamber. After removing the coverslips, slides were washed in 2× SSC at 42° for 10 min and 2× SSC at room temperature for 5 min. Biotin-labeled probes were detected by fluorescein isothiocyanate (FITC)-conjugated sheep-antidigoxigenin antibody (Roche Diagnostics). Chromosomes were counterstained with 4′,6-diamidino-2-phenylindole (DAPI) in an antifade solution (Vector Laboratories, Burlingame, CA). Chromosomes and FISH signal images were captured using an Olympus BX61 fluorescence microscope with a micro-CCD camera. Grayscale images were captured for each color channel and then merged using Image-Pro Plus software.
RESULTS AND DISCUSSION
Genetic and physical localization of BACs in tomato chromosomes:
Eleven sequenced tomato nuclear BAC clones were analyzed—3 BACs (181K1, 181O9, and 181C9) were randomly selected from a tomato BAC library and the other 8 were isolated by screening with known genes or single-copy probes (van der Hoeven et al. 2002; Y. Wang, unpublished data). These 11 sequenced tomato BAC clones were genetically mapped to their corresponding positions in 7 of the 12 tomato chromosomes in the tomato high-density genetic map (Figure 1) (Fulton et al. 2002, http://www.sgn.cornell.edu). Five of these BACs mapped to regions distal to the genetically defined centromeres and hence are likely to be in euchromatic regions, which were confirmed by FISH on tomato (S. lycopersicum) pachytene chromosomes (Figure 1, Table 1). The remaining 6 BACs were all mapped to regions proximal to centromeres (Figure 1). Of these, 2 BACs (181O9 and 47I13) hybridized to single loci clearly contained within the heterochromatin of chromosomes 8 and 9, respectively (Figure 1, Table 1). A third BAC (2O7) also hybridized to the heterochromatin, but near the heterochromatin–euchromatin boundary of chromosome 7 (Figure 1). The remaining 3 BACs (181K1, 40B13, and 181C9) hybridized to multiple sites throughout the chromosomes with most of the signals concentrated in heterochromatic regions, suggesting that these BACs contain elements common to heterochromatin on all chromosomes (Figure 1, Table 1). In addition, the hybridization signals of these 3 BACs were not evenly distributed, with some regions stained more intensively than nearby regions, which indicated the uneven distribution of repeat sequences or different condensation patterns in tomato chromosomes.
In an effort to determine compositional differences between the euchromatin and heterochromatin, each of the 11 BACs was subjected to computational and manual annotation. Putative genes were identified by virtue of ab initio gene prediction programs, significant matches to ESTs from tomato or other related solanaceous species, and significant matches to predicted proteins from Arabidopsis or rice (Figure 2B). Gene models homologous with known transposable elements in solanaceous species, rice, or Arabidopsis were annotated as TE-related genes. Thus the BACs could be assigned into three categories: (1) those containing only genes associated with transposons, (2) those containing transposon genes as well as nontransposon genes, and (3) those containing only nontransposon genes. The 6 BACs derived from heterochromatin (as determined by FISH, see previous section) fell into categories 1 and 2 (Figure 2B). Moreover, the 3 random BACs (181K1, 181C9, and 181O9) from tomato heterochromatic regions (i.e., not selected using a genic or single-copy probe) all fell into category 1—containing no genes other than those related to transposons (Figure 2B). Of the 5 BACs assigned to euchromatin, 4 fell into the third category (only nontransposon genes). The remaining BAC fell into category 2—containing both transposons and nontransposon genes (Table 1, Figure 2).
Gene composition of heterochromatin vs. euchromatin:
The six BACs assigned to heterochromatin together comprise 678 kb with a gene density (not counting transposon-related genes) corresponding to approximately one gene every 56 kb. However, this estimate for the gene density of heterochromatin has to be tempered with the fact that only three of the heterochromatic BACs were randomly drawn from the BAC library. The rest were selected with genic or other single-copy probes, hence biasing for BACs containing genes. The three truly randomly sampled heterochromatic BACs comprise 370 kb and do not contain any nontransposon genes. Thus, the estimate of one gene every 56 kb in the heterochromatin is most likely an overestimate with the actual gene density in the heterochromatin likely being much lower than this value. The situation is dramatically different for the euchromatin. The five BACs derived from euchromatin together comprised 518 kb and contain 77 nontransposon genes. Therefore, we estimate that the euchromatin contains, on average, one gene every 6.7 kb.
The fully sequenced genomes of Arabidopsis and rice provide reference points with which to compare the organization of tomato euchromatin and heterochromatin. For all three species, the estimated nontransposon (non-TE) gene densities for euchromatin are remarkably similar: tomato, 6.7 kb/gene; Arabidopsis, 4.5 kb/gene; and rice, 6.9 kb/gene (Copenhaver et al. 1999; Kumekawa et al. 2001; Jiao et al. 2005). However, striking differences are revealed in the heterochromatin. Rice heterochromatin has gene density only slightly lower than the gene density in euchromatin (11 kb/gene, Jiao et al. 2005). In contrast, Arabidopsis and tomato heterochromatin have non-TE gene densities dramatically lower than that in euchromatin: Arabidopsis, 256 kb/gene on chromosomes 2 and 4; and tomato, >56 kb/gene (Copenhaver et al. 1999). As already noted, the higher gene density estimate for tomato heterochromatin (in comparison with Arabidopsis) may reflect a bias in the tomato estimate due to the nonrandom sampling of heterochromatic BACs (see previous section).
The tomato genome is composed of ∼950 Mb of DNA, 25% of which is euchromatin (Arumuganathan et al. 1991; De Jong 1998; Peterson et al. 1998). Thus, at a gene density of 6.7 kb/gene, we estimate that the euchromatin contains ∼35,440 genes. On the basis of a large EST database, the entire tomato genome has been previously estimated to encode ∼35,000 genes (van der Hoeven et al. 2002). Thus the euchromatin is sufficient to encode most of tomato non-TE genes. The remaining 75% (∼712 Mb) of the tomato genome is composed of heterochromatin. If the heterochromatin contains 1 gene every 56 kb, one would estimate that the heterochromatin contains 12,700 genes and that the tomato genome contains a total of 48,000 genes, which is dramatically higher than the estimate from the EST data. This discrepancy is likely due to the biased estimate of the heterochromatic gene density (see previous section). If the unbiased heterochromatin non-TE gene density from Arabidopsis (256 kb/gene) is used for the estimation, the tomato heterochromatin would be predicted to contain only 2800 genes. This would also lead to a predicted gene content of 38,240 genes in the tomato genome (35,440 in the euchromatin + 2800 in the heterochromatin)—a value much closer to that predicted from EST data (van der Hoeven et al. 2002). If these estimates are valid, then the tomato heterochromatin, which comprises 75% of the tomato nuclear genome, likely accounts for <10% of the non-TE genes—an important observation to be taken into account when contemplating sequencing of the genome of tomato or other solanaceous species (see next section).
Repeat composition of heterochromatin vs. euchromatin:
In an effort to identify repeat elements in the tomato genome, all BAC sequences were compared with each other using blastn. A sequence element was classified as a “repeat” if it was shared by two or more BAC clones. Using this criterion, a total of three distinct repeat families were identified, Jinling, LARD1, and LARD2, involving five different BACs—all assigned to the heterochromatin (Table 1). No shared repeats were identified in the euchromatic BACs (Table 1).
Tom-LARD1 and Tom-LARD2:
Two repeat families associated with retrotransposons (deemed large retrotransposon derivatives, LARDs) (Kalendar et al. 2004) were identified due to the hallmarks of long terminal repeats (LTRs), ranging from 750 to 1100 bp. LARD1 and LARD2 had two- and three-member elements, respectively. The sizes of the internal sequences of LARD1 and LARD2 elements varied from 2 to 5 kb, but none of them contained any coding sequences, and hence they are not likely to be functional elements. Elements in the same family only share sequence similarity in the LTRs, but not in the internal noncoding sequences, suggesting that LARDs are nonautonomous retroelements.
LTRs at opposite ends of each Tom-LARD1 and Tom-LARD2 element shared 90–96% identity (sequence alignments are in the supplemental material at http://www.genetics.org/supplemental/). A comparison of orthologous, noncoding intergenic nuclear DNA segments from tomato and potato has revealed a sequence identity of 86% (Y. Wang, unpublished data for BAC comparison). Using this value and assuming that the transposon elements had identical LTRs upon insertion, we estimate that these LARDs integrated into their existing sites after the divergence of tomato and potato—an event estimated to have occurred 10 MYA (Chaw et al. 1997; Wikstrom et al. 2001).
Jinling—a high-copy retrotransposon family preferential to tomato pericentromeric heterochromatin:
A third repeat family was identified that contains 15 elements (Table 1). The elements were found in multiple copies in five of the six BACs assigned to heterochromatin and hence may represent an abundant and heterochromatin-preferential repeat. On the basis of alignments and annotation of the 15 repeat members, this element is deduced to be a member of the Ty3/gypsy retrotransposon family and to share homology (84% identity) with the LTRs of two pericentromeric retrotransposon (PCRT)1a elements reported by Yang et al. (2005) (Figure 2A; Su and Brown 1997; Marin and Llorens 2000) (data not shown). Eight of the 15 members appear to be complete, containing all of the structural and coding elements predicted for the Ty3 retrotransposon family. However, none of these elements appear to be capable of autonomous transposition since one or more of the coding regions always contained premature stop codons. A complete element from BAC 40B13 is annotated to comprise 8.8 kb, including 2-kb LTRs, two reverse transcriptase (RT) domains, and one integrase (IN) domain (Figure 2A, supplemental Table 1 at http://www.genetics.org/supplemental/). All but one member in BAC 2O7 also lack the GAG domain found in many other Ty3 retrotransposons. The LTRs of this retrotransposon family also show homology (80% identity) with a previously described tomato repeat—tomato genome repeat II (TGRII) (Ganal et al. 1988). The TGRII repeat family was determined to consist of >4000 member repeats, making it the highest-copy, interspersed element in the tomato genome (Ganal et al. 1988). On the basis of the high level of homology, we propose a family name of Jinling (meaning “golden bell” in Chinese and indicating the heterochromatin vicinity) to represent all the related elements reported herein as well as the previously reported PCRT1a and TGRII elements (Ganal et al. 1988; Yang et al. 2005).
Copy number of Jinling:
The copy number of Jinling in the tomato genome was estimated via a genomic reconstruction experiment using various components of the Jinling element from BAC 40B13 as probes (supplemental Table 1 at http://www.genetics.org/supplemental/). The results indicate that the tomato genome contains ∼2080, 1800, 1120, and 1450 copies of the LTR, RT, GAG, and IN domains, respectively. Thus, on the basis of Southern hybridization, we estimate a minimum of ∼2000 Jinling elements in the tomato genome. Previous reports had estimated that TGRII is present in ∼4000 copies (Ganal et al. 1988). The discrepancy in these estimates may be due to the fact that TGRII shows 80% homology with the canonical Jinling element. This level of sequence divergence is nearing the limits of detection by Southern hybridization. Thus, it is likely that additional, more diverged, copies of the Jinling element were not detected by the Southern reconstruction experiments described herein. Using 2000 and 4000 as the upper and lower copy-number limits and 8.8 kb as the upper size limit, we estimate that Jinling comprises ∼17.5–35 Mb of genomic DNA in the tomato genome (or ∼2.5% of the heterochromatin). Thus, Jinling is the largest family of retrotransposons thus far identified in the tomato genome and is apparently specific to the pericentric heterochromatin.
Jinling is nonrandomly distributed in pericentromeric heterochromatin:
While Jinling is apparently specific to the heterochromatin, it is not necessarily uniformly distributed throughout the heterochromatin. To address this issue, various domains of the Jinling element (LTR, RT, IN, and GAG) as well as TGRII (Ganal et al. 1988) were used as probes for FISH on tomato pachytene chromosomes. The results show that all probes hybridized almost entirely to pericentromeric heterochromatin—a result consistent with FISH experiments conducted with complete BACs containing Jinling (Figures 1 and 3). However, nonuniform hybridization signals suggest that Jinling may not be randomly distributed in the heterochromatin (Figure 3). To further investigate this possibility, the Jinling LTR and the entire TGRII elements were used as probes on tomato BAC filters of the HindIII library (Budiman et al. 2000). If evenly distributed in the heterochromatin, one Jinling element would be predicted every ∼240 kb (using a 3000-copy-number estimate and 710 Mb of pericentric heterochromatin—see previous section). Considering an average BAC insert size of 120 kb one would predict that ∼37% of heterochromatic BAC clones (28% of all BACs) would contain at least one Jinling element. However, actual hybridization results indicated that only 13% of the BACs contain Jinling elements—a result consistent with Jinling occurring in nonrandom clusters in the heterochromatin. However, we cannot rule out the possibility that the BAC library is not biased against heterochromatic clones that are more likely to be rich in repeat elements. However, to further test the hypothesis that Jinling is not randomly distributed in the heterochromatin, the frequency and distribution of Jinling in the six heterochromatic BACs was tested against a uniform distribution model using the Kolmogorov–Smirnov test of uniform distribution. Using this test, we could reject the hypothesis that Jinling is randomly distributed across the six sequenced heterochromatic BACs (P < 0.001). This clustering effect may be caused in part by localized movement of Jinling (see next section).
Estimating the timing of Jinling transposition during evolution of the Solanaceae:
The inserted element must have terminal repeats that were identical in sequence at the time of the insertion.
Random mutations begin to accumulate in the terminal repeats after insertion. This mutational drift in sequence cannot be affected by selection—i.e., the principle of random genetic drift applies.
The terminal repeats, at both ends of the element, have been retained and can be aligned, allowing generation of sequence divergence statistics.
Neutral nucleotide divergence rates for noncoding regions are available for the organism under study, allowing molecular clock estimates to be used in calculating the amount of time that has passed since the element inserted and the terminal repeats began to mutate.
LTRs at the opposite ends of Jinling members share sequence similarities ranging from 90 to 97%. Considering the intergenic sequence identity of 86% between tomato and potato (Y. Wang, unpublished data), these results indicate that Jinling spread throughout the tomato heterochromatin subsequent to the divergence of tomato and potato from their last common ancestor—estimated to be 10 MYA (Chaw et al. 1997; Wikstrom et al. 2001). Moreover, an unrooted tree suggests that the Jinling LTRs began diverging from each other at a very similar time (Figure 4). Using neutral substitution rates reported by Nesbitt and Tanksley (2002), we estimate that Jinling began spreading throughout the tomato genome from ∼5 MYA—a period corresponding to the radiation of the Solanum tomato clade (formerly Lycopersicon) containing tomato and its closely related wild species (Miller and Tanksley 1990; Nesbitt and Tanksley 2002). Thus, Jinling elements are young TEs in the tomato genome, similar to the Dasheng elements found in the rice centromeric regions (Jiang et al. 2002).
Occurrence of Jinling in the genomes of other solanaceous species:
Southern hybridization using a Jinling LTR probe showed strong signals in tomato and the closely related wild tomato species (S. pimpinellifolium, S. chmielewskii, S. peruvianum, S. chilense, S. pennellii, and S. habrochaites), but the hybridization signals were not detected in potato, eggplant, pepper, and petunia (Figure 5). The confinement of Jinling to tomato and its closest Solanum relatives is consistent with the estimated transposition date and the time that this Solanum tomato clade diverged from its last common ancestor (see previous section).
Jinling elements are highly methylated and rarely transcribed:
The methylation status of Jinling elements was tested via digestion with methylation-sensitive and -insensitive isoschizomers and Southern hybridization using Jinling fragments as probes. Chloroplast probes were used as negative controls as cpDNA is normally unmethylated (Fojtova et al. 2001). The results indicate that Jinling elements are highly methylated at both of the CNG and CG sites (Figure 6). DNA methylation is often associated with reduced gene expression and previous studies in maize and Arabidopsis have also shown that repetitive sequences in heterochromatin are highly methylated (Bennetzen et al. 1994; Lippman et al. 2004; Rabinowicz et al. 2005). A blastn search against >180,000 tomato ESTs (www.sgn.cornell.edu) revealed no matches to Jinling. Hence, despite its high copy number, the Jinling element must be rarely transcribed.
Sequence composition of euchromatin and pericentromeric heterochromatin:
The results presented herein paint the picture of the tomato genome in which at least 90% of the nontransposon genes are sequestered in the contiguous stretches of euchromatin found at the distal portions of most chromosomes and comprising only 25% of the total DNA in the tomato genome. These euchromatic regions are largely devoid of repetitive elements (e.g., retrotransposons) and contain gene density similar to that of the euchromatin of species with smaller genomes such as rice and Arabidopsis. In contrast, the pericentromeric heterochromatin has a gene density 10–100 times lower than that of euchromatin and is largely occupied by retrotransposons—the largest family being the Ty3/gypsy-type Jinling family. LTR divergence data suggest that Jinling originated and spread rapidly in the pericentromeric heterochromatin of tomato and closely related wild tomato species ∼5 MYA—well after the divergence from potato and other solanaceous species that lack this element. However, the fact that tomato and other solanaceous species have very similar chromosome architecture with respect to euchromatin and pericentromeric heterochromatin raises the possibility that heterochromatin regions in these species are occupied by other retrotransposon families, which are common to all Solanaceae and might have radiated at different times during Solanaceae evolution (Gottschalk 1954). The fact that pericentromeric heterochromatin appears to be deficient in genes and is evolving rather rapidly with respect to repeat composition may explain why chromosome pairing and meiotic recombination are often repressed (up to 1000-fold) in heterochromatin vs. euchromatin—especially in interspecific hybrids (Tanksley et al. 1992).
Implications of this study to the sequencing of the tomato genome and the genomes of other solanaceous species:
The results from this study indicate that as much as 90% of the estimated 38,000 nonretrotransposon genes in tomato can be discovered by sequencing the tomato euchromatin, which accounts for only 25% of the total DNA. Moreover, the high gene density and the lack of repetitive sequences should make computational assembly of euchromatin sequences relatively straightforward. Currently, an international project to sequence the tomato euchromatin on a BAC-by-BAC basis is being conducted by a consortium of 10 countries (http://www.sgn.cornell.edu/). Because of the low gene density, sequencing the 75% of the genome comprising heterochromatin would be expensive and may result in discovery of a relatively small fraction of genes. The small portion of genes embedded in the heterochromatin could be more efficiently discovered through EST databases (currently >500,000 ESTs exist for tomato and related solanaceous species; http://www.sgn.cornell.edu/) and/or methyl filtration sequencing (Palmer et al. 2003; Bedell et al. 2005). Finally, because of the high level of conserved chromosomal macrosynteny and microsynteny among the genomes of solanaceous species, the sequence of the tomato euchromatin could be utilized in genetic/genomic studies across the entire family (Doganlar et al. 2002; Paran et al. 2004; Y. Wang, unpublished data).
We thank Wojtek Pawlowski, Zach Lippman, Amy Frary, and Steve M. Stack for critical reading of the manuscript. This work was partially supported by National Science Foundation grant DBI-0116076, “exploitation of tomato as a model for comparative and functional genomics,” and 0421634, “sequence and annotation of the euchromatin of tomato,” and by a Binational Agricultural Research and Development grant, IS-3337-02, “saturation mutagenesis in tomato: an integrated infrastructure for functional genomics.”
↵1 Present address: Wuhan Botanical Garden, Chinese Academy of Science, Wuhan, Hubei, People's Republic of China, 430074.
Communicating editor: J. Tamkun
- Received October 18, 2005.
- Accepted February 6, 2006.
- Copyright © 2006 by the Genetics Society of America