- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Carels, N.
- Articles by Bernardi, G.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Carels, N.
- Articles by Bernardi, G.
Two Classes of Genes in Plants
Nicolas Carelsa,b and Giorgio Bernardia,ba Laboratoire de Génétique Moléculaire, Institut Jacques Monod, F-75005 Paris, France
b Laboratorio di Evoluzione Molecolare, Stazione Zoologica Anton Dohrn, I-80121 Napoli, Italy
Corresponding author: Giorgio Bernardi, Laboratorio di Evoluzione Molecolare, Stazione Zoologica Anton Dohrn, Villa Comunale, I-80121 Napoli, Italy., bernardi{at}alpha.szn.it (E-mail)
Communicating editor: S. YOKOYAMA
| ABSTRACT |
|---|
Two classes of genes were identified in three Gramineae (maize, rice, barley) and six dicots (Arabidopsis, soybean, pea, tobacco, tomato, potato). One class, the GC-rich class, contained genes with no, or few, short introns. In contrast, the GC-poor class contained genes with numerous, long introns. The similarity of the properties of each class, as present in the genomes of maize and Arabidopsis, is particularly remarkable in view of the fact that these plants exhibit large differences in genome size, average intron size, and DNA base composition. The functional relevance of the two classes of genes is stressed by (1) the conservation in homologous genes from maize and Arabidopsis not only of the number of introns and of their positions, but also of the relative size of concatenated introns; and (2) the existence of two similar classes of genes in vertebrates; interestingly, the differences in intron sizes and numbers in genes from the GC-poor and GC-rich classes are much more striking in plants than in vertebrates.
EUKARYOTIC genomes cover a large spectrum of haploid sizes [or C values; for angiosperms see ![]()
![]()
In the case of vertebrates, this was demonstrated by showing that the very small genome (400 Mb) of Arothron diadematus, a fish belonging to the order Tetraodontiformes, is characterized by very small amounts of highly and moderately repetitive sequences (![]()
![]()
As far as introns are concerned, it was shown (![]()
![]()
![]()
![]()
![]()
![]()
In the case of plants, phenomena of genome contraction/expansion are also known to occur. To cite just one example, a wide range of genome sizes is known in Gramineae, C values ranging from 415 Mb in the case of Oryza sativa to 12,600 Mb in the case of Avena sativa. However, no investigation has been reported so far on intron sizes as related to GC richness.
In this work, we explored plant genomes from three Gramineae (maize, rice, barley) and six dicots (Arabidopsis, soybean, pea, tobacco, tomato, potato) to see whether intron size and GC richness are correlated with each other in these genomes, as is the case for the genomes of vertebrates. The plants studied were chosen so as to explore genomes characterized by two different compositional situations. Indeed, the genomes of Gramineae are GC rich and their coding sequences cover a broad compositional range, whereas the genomes of the dicots studied are GC poor and their coding sequences cover a narrow GC range (![]()
![]()
![]()
| MATERIALS AND METHODS |
|---|
Using the Infobiogen server (see http://www.infobiogen.fr), we extracted genomic DNA sequences encompassing complete genes from angiosperms (excluding seed-storage protein genes) from release 108 (August 1998) of GenBank with the ACNUC/QUERY retrieval system (![]()
![]()
Homologous gene pairs were obtained as previously described (![]()
![]()
![]()
Genes available for each organism were ordered according to GC levels and divided into two classes by taking as the cutting point the mode of their distribution (unless two classes were already obvious, as in the case of Gramineae).
All GC-poor and GC-rich genes were analyzed for size, number, and GC level of exons, introns, and coding sequences. The statistical significance of the differences was analyzed using the Student's test (![]()
![]()
= 0.05.
| RESULTS |
|---|
The compositional distribution of genes from maize and other Gramineae:
The compositional distribution of coding sequences of maize is very broad, 4575% GC and at least bimodal (Fig 1; see also ![]()
![]()
![]()
![]()
|
|
The two classes of maize genes are characterized by distinct features, which are presented in Fig 3 and Table 1 and can be summarized as follows. GC-rich genes contain exons that are short relative to exons of vertebrates, no introns in 43% of the cases, and very few, short introns in those genes that contain them. In contrast, GC-poor genes contain even shorter exons and more numerous, longer introns. Interestingly, the difference in average exon size between the two classes of genes is not accompanied by a significant difference in average size of whole coding sequences (see below). Needless to say, the features just described for GC-rich and GC-poor genes account for the bimodality of the compositional distribution of maize genes. Moreover, in GC-rich genes, GC levels of exons and introns are 67 and 48%, respectively, whereas in GC-poor genes, exons and introns exhibit GC levels of 56 and 40%, respectively. The GC levels of GC-rich genes were also found to be significantly higher than those of GC-poor genes by as much as 5, 9, and 20% for first, second, and third codon positions, respectively (see Table 2). These large compositional differences of introns and exons are typical of plants, the corresponding differences in vertebrate genes being much smaller (![]()
|
|
|
The compositional distribution of genes from Arabidopsis and other dicots:
In contrast with maize, coding sequences, introns, and genes from Arabidopsis are characterized (Fig 1) by unimodal distributions and by smaller compositional differences of exons (45 and 49%) and introns (31 and 33%) in GC-poor and GC-rich genes, respectively (Table 1; see also ![]()
![]()
![]()
![]()
The other distinguishing features between the two classes are similar to those described above for maize (see Table 1 and Fig 3). Indeed, the proportion of intron-less genes in GC-rich genes of Arabidopsis (38%) is close to that of maize (43%), and intron numbers are also similar in both classes of genes in the two genomes. The smaller intron sizes in GC-poor genes (but not in GC-rich genes) were to be expected because of the much smaller genome size of Arabidopsis (~120 Mb) compared to maize (~2500 Mb).
The features described for Arabidopsis genes were also found in the genes of other dicots, soybean, tobacco, tomato, and potato (see Table 1), with minor differences, probably due to differences in the gene samples.
As far as homologous genes from Arabidopsis and maize are concerned, intron number and the size of concatenated introns are correlated with coefficients of 0.96 and 0.68, respectively (Fig 4). In contrast, the GC levels of concatenated introns of these homologous genes are not correlated (data not shown). This is not surprising in view of the very narrow compositional distribution of introns in Arabidopsis genes (![]()
|
Statistical analysis:
All data of Table 1 were found to be significantly different between the two classes of genes, except for coding sequence size and GC level of concatenated introns. In the case of coding sequence size, the Student's test indicated no significant difference whatever the species under consideration. Moreover, exons were significantly shorter, on the average, in GC-poor genes compared to GC-rich genes in all angiosperms tested (Table 1). Differences in GC levels between GC-poor and GC-rich genes were also found to be statistically significant for maize as well as Arabidopsis when individual codon positions were compared (see Table 2). The GC levels of concatenated introns were found to be different between the two classes of genes from Gramineae, but the situation was less clear in dicots. In tobacco and tomato, GC levels of concatenated introns were found to be different on the average, but this was not the case in the other dicots. This situation is probably due to several factors, such as the low levels of genome heterogeneity and rather small sample sizes.
| DISCUSSION |
|---|
The main finding of this work is the identification in plants of two classes of genes, which were first observed in maize (![]()
The fact that genes are, on the average, interrupted by either a large number of long introns or a small number of short introns and that GC levels are different in the two classes of genes is far from trivial for two reasons. First, these properties are found not only in angiosperms, but also in vertebrates (![]()
Moreover, GC-rich genes were observed to be, on the average, richer in GC in all codon positions in all species tested, but especially in Gramineae, compared to GC-poor genes (Table 2). GC differences in second codon positions obviously have repercussions on the amino acid composition of the encoded proteins, a subject currently under investigation.
As far as the functional meaning of the two classes of genes is concerned, we would like to speculate that since housekeeping genes were found to be associated with GC-rich genes not only in Arabidopsis and maize (![]()
![]()
In the case of GC-poor genes, which are largely tissue specific in vertebrates (![]()
![]()
The phenomenon of intron depletion described here, which accompanies the increase in GC content of genes, was already reported in vertebrates (![]()
| ACKNOWLEDGMENTS |
|---|
We thank O. Clay for technical help and useful discussions.
Manuscript received August 30, 1999; Accepted for publication December 6, 1999.
| LITERATURE CITED |
|---|
BARAKAT, A., G. MATASSI, and G. BERNARDI, 1998 Distribution of genes in the genome of Arabidopsis thaliana and its implications for the genome organization of plants. Proc. Natl. Acad. Sci. USA 95:10044-10049
BELL, M. V., A. E. COWPER, M. P. LEFRANC, J. I. BELL, and G. R. SCREATON, 1998 Influence of intron length on alternative splicing of CD44. Mol. Cell. Biol. 18:5930-5941
BENNETT, M. D. and J. B. SMITH, 1991 Nuclear DNA amounts in angiosperms. Philos. Trans. R. Soc. Lond. B Biol. Sci. 334:309-345.
BERNARDI, G., 1995 The human genome: organization and evolutionary history. Annu. Rev. Genet. 29:445-476[Medline].
BERNARDI, G. and G. BERNARDI, 1990 Compositional patterns in the nuclear genome of cold-blooded vertebrates. J. Mol. Evol. 31:265-281[Medline].
BICKMORE, W., and J. CRAIG, 1997 Chromosome Bands: Patterns in the Genome. Springer, New York.
BRENNER, S., G. ELGAR, R. SANDFORD, A. MACRAE, and B. VENKATESH et al., 1993 Characterization of the pufferfish (Fugu) genome as a compact model vertebrate genome. Nature 366:265-268[Medline].
CARELS, N., A. BARAKAT, and G. BERNARDI, 1995 The gene distribution of the maize genome. Proc. Natl. Acad. Sci. USA 92:11057-11060
CARELS, N., P. HATEY, K. JABBARI, and G. BERNARDI, 1998 Compositional properties of homologous coding sequences from plants. J. Mol. Evol. 46:45-53[Medline].
CAVALIER-SMITH, T., 1985 Eukaryote gene numbers, non-coding DNA and genome size, pp. 69103 in The Evolution of Genome Size, edited by T. CAVALIER-SMITH. Wiley, London.
CHIAPELLO, H., F. LISACEK, M. CABOCHE, and A. HÉNAUT, 1998 Codon usage and gene function are related in sequences of Arabidopsis thaliana. Gene 209:GC1-GC38[Medline].
DURET, L., D. MOUCHIROUD, and C. GAUTIER, 1995 Statistical analysis of vertebrate sequences reveals that long genes are scarce in GC-rich isochores. J. Mol. Evol. 40:308-317[Medline].
GAUTIER, C., and M. JACOBZONE, 1989 <http,//biom3.univ-lyon1.fr,8080/doclogi/docanals/manuel.html>, Publication interne, UMR CNRS 5558 Biometrie, Genetique et Biologie des Populations, Universite Claude Bernard, Lyon I, France.
GOUY, M., C. GAUTIER, N. ATTIMONELLI, C. LANAVE, and G. DI PAOLA, 1985 ACNUCportable retrieval system for nucleic acid sequence database: logical and physical design and usage. Comput. Appl. Biosci. 1:167-172
HUGHES, A. L. and M. K. HUGHES, 1995 Small genomes for better flyers. Nature 377:391[Medline].
MASON, P. J., D. J. STEVENS, L. LUZZATTO, S. BRENNER, and S. APARICIO, 1995 Genomic structure and sequence of the Fugu rubripes glucose-6-phosphate dehydrogenase gene (G6PD). Genomics 26:587-591[Medline].
MATASSI, G., L. M. MONTERO, J. SALINAS, and G. BERNARDI, 1989 The isochores organisation and compositional distribution of homologous coding sequences in the nuclear genome of plants. Nucleic Acids Res. 17:5273-5290
PEARSON, W. R., T. WOOD, Z. ZHANG, and W. MILLER, 1997 Comparison of DNA sequences with protein sequences. Genomics 46:24-36[Medline].
PIZON, V., G. CUNY, and G. BERNARDI, 1984 Nucleotide sequence organization in the very small genome of a tetraodontid fish, Arothron diadematus. Evol. J. Biochem. 140:25-30.
SALINAS, J., G. MATASSI, L. M. MONTERO, and G. BERNARDI, 1988 Compositional compartmentalization and compositional patterns in the nuclear genomes of plants. Nucleic Acids Res. 16:4269-4285
STUDENT,, 1908 The probable error of a mean. Biometrika 6:1-25
STUDENT,, 1925 New tables for testing the significance of observations. Metron 5:105-120.
VILLARD, L., F. TASSONE, T. CRNOGORAC-JURCEVIC, K. CLANCY, and K. GARDINER, 1998 Analysis of pufferfish homologues of the AT-rich human APP gene. Gene 210:17-24[Medline].
This article has been cited by other articles:
![]() |
D. G. Howarth and M. J. Donoghue Duplications and Expression of DIVARICATA-Like Genes in Dipsacales Mol. Biol. Evol., June 1, 2009; 26(6): 1245 - 1258. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. H. Paterson, J. E. Bowers, F. A. Feltus, H. Tang, L. Lin, and X. Wang Comparative Genomics of Grasses Promises a Bountiful Harvest Plant Physiology, January 1, 2009; 149(1): 125 - 131. [Full Text] [PDF] |
||||
![]() |
P. Mukhopadhyay, S. Basak, and T. C. Ghosh Differential Selective Constraints Shaping Codon Usage Pattern of Housekeeping and Tissue-specific Homologous Genes of Rice and Arabidopsis DNA Res, December 1, 2008; 15(6): 347 - 356. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Smarda, P. Bures, L. Horova, B. Foggi, and G. Rossi Genome Size and GC Content Evolution of Festuca: Ancestral Expansion and Subsequent Reduction Ann. Bot., February 1, 2008; 101(3): 421 - 433. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Campbell, W. Zhu, N. Jiang, H. Lin, S. Ouyang, K. L. Childs, B. J. Haas, J. P. Hamilton, and C. R. Buell Identification and Characterization of Lineage-Specific Genes within the Poaceae Plant Physiology, December 1, 2007; 145(4): 1311 - 1322. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Glemin, E. Bazin, and D. Charlesworth Impact of mating systems on patterns of sequence polymorphism in flowering plants Proc R Soc B, December 7, 2006; 273(1604): 3011 - 3019. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Xia, H. Wang, Z. Xie, M. Carullo, H. Huang, and D. Hickey Cytosine Usage Modulates the Correlation between CDS Length and CG Content in Prokaryotic Genomes Mol. Biol. Evol., July 1, 2006; 23(7): 1450 - 1454. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. L. Christianson Codon usage patterns distort phylogenies from or of DNA sequences Am. J. Botany, August 1, 2005; 92(8): 1221 - 1233. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. An, S. Lee, S.-H. Kim, and S.-R. Kim Molecular Genetics Using T-DNA in Rice Plant Cell Physiol., January 15, 2005; 46(1): 14 - 22. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Poncet, P. Hamon, M.-B. Sauvage de Saint Marc, T. Bernard, S. Hamon, and M. Noirot Base Composition of Coffea AFLP Sequences and Their Conservation Within the Genus J. Hered., January 1, 2005; 96(1): 59 - 65. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. A. Ralph, B. J. Foth, N. Hall, and G. I. McFadden Evolutionary Pressures on Apicoplast Transit Peptides Mol. Biol. Evol., December 1, 2004; 21(12): 2183 - 2194. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Cruveiller, K. Jabbari, O. Clay, and G. Bernardi Compositional Gene Landscapes in Vertebrates Genome Res., May 1, 2004; 14(5): 886 - 892. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. A. Shabalina, A. Y. Ogurtsov, I. B. Rogozin, E. V. Koonin, and D. J. Lipman Comparative analysis of orthologous eukaryotic mRNAs: potential hidden functional signals Nucleic Acids Res., March 18, 2004; 32(5): 1774 - 1782. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Kuhl, F. Cheung, Q. Yuan, W. Martin, Y. Zewdie, J. McCallum, A. Catanach, P. Rutherford, K. C. Sink, M. Jenderek, et al. A Unique Set of 11,008 Onion Expressed Sequence Tags Reveals Expressed Sequence and Genomic Differences between the Monocot Orders Asparagales and Poales PLANT CELL, January 1, 2004; 16(1): 114 - 125. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-c. Wang, G. A. C. Singer, and D. A. Hickey Mutational Bias Affects Protein Evolution in Flowering Plants Mol. Biol. Evol., January 1, 2004; 21(1): 90 - 96. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Blanc, K. Hokamp, and K. H. Wolfe A Recent Polyploidy Superimposed on Older Large-Scale Duplications in the Arabidopsis Genome Genome Res., February 1, 2003; 13(2): 137 - 144. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Scala, N. Carels, A. Falciatore, M. L. Chiusano, and C. Bowler Genome Properties of the Diatom Phaeodactylum tricornutum Plant Physiology, July 1, 2002; 129(3): 993 - 1002. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. K.-S. Wong, J. Wang, L. Tao, J. Tan, J. Zhang, D. A. Passey, and J. Yu Compositional Gradients in Gramineae Genes Genome Res., June 1, 2002; 12(6): 851 - 856. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Lynch Intron evolution as a population-genetic process PNAS, April 30, 2002; 99(9): 6118 - 6123. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Yu, S. Hu, J. Wang, G. K.-S. Wong, S. Li, B. Liu, Y. Deng, L. Dai, Y. Zhou, X. Zhang, et al. A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. indica) Science, April 5, 2002; 296(5565): 79 - 92. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. E. Vinogradov Bendable Genes of Warm-blooded Vertebrates Mol. Biol. Evol., December 1, 2001; 18(12): 2195 - 2200. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. S. Mattick and M. J. Gagen Review ArticleThe Evolution of Controlled Multitasked Gene Networks: The Role of Introns and Other Noncoding RNAs in the Development of Complex Organisms Mol. Biol. Evol., September 1, 2001; 18(9): 1611 - 1630. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. N. Raizada, G.-L. Nan, and V. Walbot Somatic and Germinal Mobility of the RescueMu Transposon in Transgenic Maize PLANT CELL, July 1, 2001; 13(7): 1587 - 1608. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Fu, W. Park, X. Yan, Z. Zheng, B. Shen, and H. K. Dooner The highly recombinogenic bz locus lies in an unusually gene-rich region of the maize genome PNAS, June 28, 2001; (2001) 141221898. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Fu, W. Park, X. Yan, Z. Zheng, B. Shen, and H. K. Dooner From the Cover: The highly recombinogenic bz locus lies in an unusually gene-rich region of the maize genome PNAS, July 17, 2001; 98(15): 8903 - 8908. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. C. Meyers, S. V. Tingey, and M. Morgante Abundance, Distribution, and Transcriptional Activity of Repetitive Elements in the Maize Genome Genome Res., October 1, 2001; 11(10): 1660 - 1676. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Carels, N.
- Articles by Bernardi, G.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Carels, N.
- Articles by Bernardi, G.
















