Abstract
Rho GTPases regulate a number of important cellular functions in eukaryotes, such as organization of the cytoskeleton, stress-induced signal transduction, cell death, cell growth, and differentiation. We have conducted an extensive screening, characterization, and analysis of genes belonging to the Ras superfamily of GTPases in land plants (embryophyta) and found that the Rho family is composed mainly of proteins with homology to RAC-like proteins in terrestrial plants. Here we present the genomic and cDNA sequences of the RAC gene family from the plant Arabidopsis thaliana. On the basis of amino acid alignments and genomic structure comparison of the corresponding genes, the 11 encoded AtRAC proteins can be divided into two distinct groups of which one group apparently has evolved only in vascular plants. Our phylogenetic analysis suggests that the plant RAC genes underwent a rapid evolution and diversification prior to the emergence of the embryophyta, creating a group that is distinct from rac/cdc42 genes in other eukaryotes. In embryophyta, RAC genes have later undergone an expansion through numerous large gene duplications. Five of these RAC duplications in Arabidopsis thaliana are reported here. We also present an hypothesis suggesting that the characteristic RAC proteins in higher plants have evolved to compensate the loss of RAS proteins.
THE Rho family of GTPases, Rho, Rac, and Cdc42, is a diverse group of proteins with an evolutionary history dating back to the first unicellular eukaryotic cells. Proteins of the Rho family are found in protists as well as in fungi, plants, and animals (Madaule and Axel 1985; Yang and Watson 1993; Lohia and Samuelson 1996). When activated by external (or internal) signals the Rho family GTPases are converted to a GTP-binding form that interacts with cellular target proteins or effectors and produces a variety of cellular responses (Hall 1998). These include organization of the actin cytoskeleton, regulating programmed cell death, stress-induced signal transduction, and cell growth and differentiation by mediating signals from membrane-bound receptors. The Rac proteins are important for selection of the location of actin polymerization during membrane ruffling and lamellipodia formation in mammalian cells (Ridleyet al. 1992). In the budding yeast Saccharomyces cerevisiae, which do not have Rac proteins, the Cdc42 protein plays an important role coordinating actin-dependent morphogenetic processes such as bud emergence, the formation of mating projections, and pseudohyphal growth (Johnson 1999). Recent reports indicate that Rac proteins in plants have related functions to their counterparts in yeast and animals (Xiaet al. 1996; Lin and Yang 1997). In addition, plant RAC proteins have been suggested to regulate pollen tube tip growth through targeted secretions of vesicles (Kostet al. 1999; Liet al. 1999). Another function of Rac proteins is their ability to regulate the activation of a multicomponent plasma membrane-bound NADPH-dependent oxidase that triggers the “oxidative burst” (Aboet al. 1991). There is now also compelling evidence that RAC proteins in plants have similar regulatory function and that they have a key role in the production of reactive oxygen species that is associated with plant defense against pathogens (Kawasakiet al. 1999; Potikhaet al. 1999).
In mammalian cells an extended family of 14 Rho GTPases has been identified that can be further divided into seven distinct subgroups (Nobes and Hall 1994; Aspenstrom 1999). In contrast only two distinct groups of the Rho family GTPase have been found in higher plants. Only one of these proteins (GenBank accession no. U88402; Newmanet al. 1994) can be described as being a RHO-like homologue while the others represent RAC proteins (Yang and Watson 1993; Delmeret al. 1995; Wingeet al. 1997). Little is known about the genomic structure of the members of the Rho GTPase family in plants and how they have evolved through time. With the now almost complete sequence of Arabidopsis thaliana plus the genomes from Saccharomyces cerevisiae, Caenorhabditis elegans, and Drosophila melanogaster, a large genetic resource is available to compare and analyze the evolution of complete gene families.
In this article we determine and analyze the genomic sequences of 11 RAC-like genes from A. thaliana (At-RAC1-11), including 1 gene sequenced by the genome project. The RAC genes analyzed probably constitute the complete set of RAC genes from A. thaliana. All these genes have also recently been sequenced by the Arabidopsis genome project, thereby providing information about their chromosome location. Nine of the AtRAC gene sequences from A. thaliana (L.) Heynh. cv. Landsberg erecta and cv. Columbia have been compared, providing a rare opportunity to study the short-term evolution within a gene family. We also present results that show that the RAC proteins found in angiosperms can be divided into two distinct groups, a division that must have taken place >200 million years ago. Finally our analysis of the plant RAC proteins from higher and lower plants suggests that the ancestral RAC genes underwent a rapid evolution before land plants appeared.
MATERIALS AND METHODS
Isolation of genomic DNA: DNA was isolated from four A. thaliana ecotypes, Columbia, Landsberg erecta, Wassilewskaja, and Cape Verde, with a modified minipreparation procedure adapted from Dellaporta et al. (1983). After ethanol precipitation, the pellet was dissolved in TE-buffer with 10 μg/ml RNase A (Sigma-Aldrich, St. Louis) and incubated 1 hr at 37°.
mRNA isolation and cDNA synthesis: mRNA was isolated from 5-week-old A. thaliana (var. Columbia) plants using the Dynabeads mRNA DIRECT kit according to the manufacturer’s instructions (Dynal, Oslo). Random-primed first-strand cDNA was made from ∼50 ng mRNA with a first-strand cDNA synthesis kit (Pharmacia Biotech, Piscataway, NJ). After first-strand cDNA synthesis the samples were heat inactivated at 85° for 5 min and stored at -20° until used as a template for PCR (see below).
Amplification of AtRAC cDNA and genomic probes: Specific AtRAC probes were amplified by PCR from both cDNA and genomic DNA. PCR primers were designed from known AtRAC cDNA sequences (Wingeet al. 1997). The AtRAC1 probe was amplified with the rac1f and rac1r primers (see below). The AtRAC2 probe was amplified with the rac1f and Arac2C primers. The AtRAC9 probe was amplified with the A9f and A9r primers. The probes from AtRAC7, AtRAC8, and AtRAC10 were amplified with the rac2f and rac1R primers. The following primers were used to amplify the AtRAC probes: Arac1F, 5′ TTG TTT CCT CAG GTT TTG TAG 3′; rac1f, 5′ AGG TTY ATH AAG TGT GTS ACY GT 3′; rac1r, 5′ TCA AAI ACT SCY TTC ACR TTC T 3′; Arac2C, 5′ TGA TCT CTT AGT CTT CAA TGG T 3′; rac2f, 5′ GGK AAR ACI TGY ATG CTY ATY TG 3′; A9f, 5′ GCA ACA TCA ACA TCA TCA GCA 3′; A9r, 5′ CTG GGA AGA TTG TGC AAG CA 3′ (Y, C/T; R, G/A; S, G/C; K, G/T; H, not G; I, inosine). Buffers and reagents for PCR were utilized according to standard procedures (Saikiet al. 1988). AtRAC cDNA probes were amplified from ∼5 ng first-strand cDNA, while genomic AtRAC probes were amplified from ∼50 ng genomic DNA. In general, the PCR was performed at 94° for 1 min, 50° for 1 min, and 70° for 1-2 min for a total of 40 cycles, using 1.5 units Amplitaq DNA polymerase (Perkin-Elmer, Norwalk, CT). PCR products were separated in low melt agarose gels (SeaPlaque GTG; FMC, Rockland, ME), and excised DNA fragments were treated with 1 unit/50 μl agarase (Sigma) for 1 hr at 37°. The isolated AtRAC PCR fragments, cDNA and genomic, were verified by direct sequencing before they were used as probes.
Isolation of genomic AtRAC clones: A genomic lambda FIX library from A. thaliana (L.) Heynh. cv. Landsberg erecta was obtained from the Arabidopsis Biological Resource Center (ABRC), Columbus, OH (Voytaset al. 1990). Plaque lifts and phage work were conducted according to standard procedures (Sambrooket al. 1989), using Hybond N membranes (Amersham, Uppsala, Sweden). The membranes were screened with 32P-labeled AtRAC PCR probes, produced with the Megaprime kit from Amersham/Pharmacia, Uppsala, Sweden. Approximately 300,000 plaque-forming units were screened with PCR probes from AtRAC1, -2, and -7-10. Both cDNA and genomic probes were used in four separate hybridizations, where one or more probes were used in combination. Hybridizations were done in 6× SSC, 0.5% SDS at 56° for 24 hr. Washes were done with 1× SSC, 0.1% SDS at 56°. Lambda DNA from positive clones was prepared according to Sambrook et al. (1989) and subcloned into pBluescript vectors (Stratagene, La Jolla, CA).
DNA sequencing: DNA sequencing was performed manually by dideoxy cycle sequencing (Murray 1989), with 33P-labeled dideoxynucleotides and Thermo Sequenase (Amersham). The AtRAC genomic clones were sequenced from derived subclones and from PCR fragments amplified with degenerate PCR primers matching conserved DNA motifs in the exons (primers are not shown). Both DNA strands were sequenced and gaps in the DNA sequences closed by primer walking.
Phylogenetic analyses: A RAC protein alignment was made with the Clustal X program (Thompson et al. 1994, 1997) that in addition to the plant RAC proteins also included sequences of RAC proteins from various eukaryotes. This alignment was manually refined with the GeneDoc program version 2.4.017 (http://www.cris.com/~ketchup/current.html) and the multiple sequence file was reimported to the Clustal X program. Protein weight matrices of the PAM series were used to calculate the distances and an unrooted neighbor-joining (N-J) tree was created using the neighbor-joining algorithm (Saitou and Nei 1987). Bootstrapping of the N-J tree was done with 1000 bootstrap trials and the resulting tree file was imported into the Treeview program (Page 1996).
An alignment of plant RAC cDNA sequence (including 17 ESTs) was produced with the GeneDoc program and a N-J tree was produced from the multiple sequence file with the Clustal X program. For comparison a parsimony analysis of the DNA alignment was performed with Seqboot, Dnapars, and Consense programs from the Phylip package version 3.5c (Felsenstein 1984).
RESULTS
Cloning of AtRAC genes from A. thaliana: In a previous screen for cDNAs encoding RAC- and RHO-like proteins, 10 A. thaliana cDNAs with high homologies to human rac genes were found (Wingeet al. 1997). To find the corresponding genomic clones, an A. thaliana Landsberg erecta genomic library was screened with probes derived from AtRAC1, -2, and -7-10. Both cDNA and genomic probes were used. After four separate library screens, a total of 63 genomic clones were isolated and further characterized. All of the lambda clones were verified by sequencing to contain AtRAC genes. Ten different AtRAC genes were isolated in this process and 9 of them did contain full-length AtRAC genes. The deduced protein sequences of the AtRAC genes are shown in Figure 1. The AtRAC9 gene, which we had found previously as a cDNA clone, was not found in these screens. The AtRAC9 gene was, however, recently sequenced by the Arabidopsis genome initiative, GenBank accession no. AC003672. As the AtRAC7 from Landsberg was partial, the full-length sequence was determined from several overlapping PCR fragments amplified from A. thaliana ecotype Columbia genomic DNA.
—A protein alignment of the AtRAC proteins. The two closest related pairs of AtRAC proteins, AtRAC1 and AtRAC6 and AtRAC4 and AtRAC5, are 98 and 97% identical, respectively. AtRAC9, the most divergent protein, is maximum 71% identical with the other AtRAC proteins. GenBank accession nos. AtRAC1, AF115466; AtRAC2, AF115469; AtRAC3, AF115470; AtRAC4, AF115471; AtRAC5, AF115472; AtRAC6, AF115473; AtRAC7, AF115474; AtRAC8, AF115475; AtRAC9, AF156896; AtRAC10, AF115467; AtRAC11, AF085480.
The AtRAC gene structure: The size of the AtRAC coding regions ranged from 585 (AtRAC4) to 645 bp (AtRAC10). The coding regions were interrupted by five to seven introns and the first six splice sites were 100% conserved for all but one gene, AtRAC11, which has lost intron 5. Three of the genes, AtRAC7, -8, and -10, have an extra exon at the 3′ end, which is probably the result of the insertion of an intron in exon 7 of an ancestral plant RAC gene (Figure 2, a and b). The intron sizes are generally small, ranging from 71 to 474 bp, with an average size of 140 bp. The first intron is usually the largest, but in AtRAC1 and -6, intron 4 is the longest (474 and 428 bp, respectively). The introns show little or no sequence homology, except close to the splice site junctions. One exception is the first intron of the AtRAC8 and AtRAC10 genes, which contain a cryptic exon of unknown function, GenBank accession no. AF115468. The majority of the AtRAC introns do not contain repetitive DNA, but some introns have small stretches of AT repeats. A study of the splice sites shows that most introns have common splice donor and acceptor sites. The exceptions are intron 2 in AtRAC3 and intron 6 in AtRAC1, which both have a 5′ GC splice donor. The 5′ GC splice donor sites are found only in ∼1% of the A. thaliana introns (Brown and Simpson 1998). In most introns it is possible to identify motifs with similarity to branch point sequences that are located 15-60 bp upstream of the 3′ splice sites, (results not shown). Together with AT-rich introns the branch point sequences are important guide signals that ensure the proper splicing of plant introns (Brownet al. 1996).
—(a) Splice sites between exons 7 and 8 in AtRAC7, -8, and -10 genes. The position of the splice donor site is identical for AtRAC8 and AtRAC10 but is shifted 3 bp for AtRAC7. Just a part of intron 7, which lies in phase two, is shown. As discussed in the text, we think the RAC genes in group II evolved after insertion of an intron in the 3′ end of an ancient RAC-like gene. (b) Alignment of the C-terminal part of the AtRAC proteins with homologues from Oryza sativa (OsRAC1, 2 and 3) and Zea Mays (ZmRACA and RACC). The arrow indicates the position in the CaaL box (a, aliphatic amino acid) where the corresponding AtRAC7, -8, and -10 genes have an intron inserted.
Comparison of AtRAC genes from Columbia and Landsberg: Comparison of the AtRAC1 promoter from the Columbia and Landsberg erecta ecotypes revealed a 155-bp insertion/deletion (indel) 590-745 bp upstream of the start codon. This 155-bp indel, which contains regions with repeated DNA, probably represents an insertion in the AtRAC1 Landsberg erecta promoter. To verify the indel, PCR primers flanking the indel were used to amplify this promoter region from four ecotypes, Columbia, Landsberg erecta, Wassilewskaja, and Cape Verde. The 155-bp insert was found in both Landsberg erecta and Cape Verde while Columbia and Wassilewskaja lacked the insert (results not shown).
The AtRAC8 gene has a repeated element inserted ∼2.4 kb upstream of the start codon in Landsberg erecta that is missing in the AtRAC8 promoter of Columbia (results not shown). Repeated elements of this type are found throughout the Arabidopsis genome (P. Winge, personal observations). The AtRAC8 promoter has a unique large repeat element ∼1290 bp in size that begins 331 bp upstream of the start codon. This repeat element, registered in GenBank (accession no. AC003952), has been found in several intergenic regions on all five chromosomes and it is found in the AtRAC8 promoter of both Landsberg erecta and Columbia.
To analyze the short-term evolution of the AtRAC genes, pairwise comparisons of nine AtRAC genes from Columbia and Landsberg erecta, AtRAC1-6, -8, -10, and -11, were conducted. Seven of the genes, AtRAC3, -4, -5, -6, -8, -10, and -11, were highly conserved between these two ecotypes and few single nucleotide polymorphisms (SNPs) and indels were found. The AtRAC3 gene had, for instance, just one SNP in the 1792 bp analyzed, and no polymorphisms were found in 1564 bp from the AtRAC5 gene. In contrast, a 2091-bp sequence of the AtRAC2 gene revealed 41 SNPs; 7 of these were located in coding regions, but none resulted in amino acid changes (Figure 3). Just 5 SNPs in AtRAC2 were located upstream of intron 3. Assuming a Poisson distribution of the SNPs, the chance of having ≤5 SNPs located upstream of intron 3 is small [P(X ≤ 5) = 0.0014 (X = number of SNPs; λ (the mean) = 16)]. A similar biased distribution of SNPs was found in AtRAC1, but there the majority of SNPs were located in the 5′ end of the gene and no polymorphisms were identified downstream of exon 4. For the other AtRAC genes the SNPs are more evenly distributed throughout the genes. In noncoding regions a SNP was found every ∼211 bp while in coding regions a SNP was found every ∼416 bp. Coding regions represent 29% of the analyzed gene sequences. When both coding and noncoding regions are included, the SNP frequency is one change every ∼246 bp. This is in good agreement with previous published data of polymorphism levels between Landsberg erecta and Columbia (Konieczny and Ausubel 1993). The results from these analyses are summarized in Table 1. The SNPs had an almost equal proportion of transitions and transversions. In addition to the SNPs the AtRAC1 and AtRAC2 genes have several indels, four of them >34 bp. Intron 4 of AtRAC2 is the most variable intron. This is partly due to a number of indels, two of which are relatively large, 67 and 342 bp in size. The AtRAC2 gene also has a TA repeat in intron 6 that was found in Columbia and Wassilewskaja but not in Landsberg erecta and Cape Verde (results not shown). Thus, the indel in the AtRAC1 promoter and the TA repeat in AtRAC2 intron 6 show that there is a close genetic relationship between the Landsberg erecta and the Cape Verde ecotypes.
—A graphical description of the SNPs found in the AtRAC2 gene when comparing the Landsberg erecta and the Columbia ecotypes. The boxes indicate the exons and the vertical lines mark the positions of the SNPs. There appears to be a biased distribution of SNPs with the majority localized in the 3′ end of the gene.
Genomic gene structure and evolution of rac genes in higher plants: To clarify the evolution and ancestry of RAC genes in plants, the AtRAC gene structure was compared to other genes of the Rho family. Until now few complete genomic Rho family genes have been characterized, but in recent years sequences from Schizosaccharomyces pombe, D. melanogaster, C. elegans, and Homo sapiens have appeared in databases. Several of these rho genes, especially from unicellular organisms such as yeasts and amoebas, are without introns. The 11 AtRAC genes reported in this study are the first complete set of genomic RAC genes characterized from a plant. From other organisms >30 Rho family genomic genes are known, but so far only 17 have introns. These 17 genes are shown in Table 2.
The rac and cdc42 genes from yeast and animals differ in gene structure when compared with the AtRAC genes. In general, the rac and cdc42 genes have fewer splice sites and the splice junction between exons 3 and 4 is often the only splice site conserved with the AtRAC genes. The amino acids flanking this splice site, Lys and Trp, are nearly 100% conserved among the Rho family members. Among the rac and cdc42 genes the human cdc42 gene has the highest structural similarity to the AtRAC genes and two of the splice sites, between exons 2 and 3 and 3 and 4, are 100% conserved.
The human rho7 gene and a RHO-like gene from S. pombe (GenBank accession no. Z97185) have a genomic structure that is surprisingly similar to the AtRAC genes. Four of the AtRAC introns, introns 1, 3, 5, and 6, lie in phase 0 (the splice site does not split a codon), while introns 2 and 4 lie in phase 1 (the splicing occurs between nucleotides 1 and 2 in a codon). In both the human rho7 and the S. pombe RHO-like gene, their introns are 100% conserved and lie in the same phases as the corresponding AtRAC introns. The chance that the AtRAC, human rho7, and the S. pombe rho gene have had their introns inserted at exactly the same position independent of each other at a later stage in evolution is unlikely. These introns may therefore represent ancient introns inherited from a primordial rac/rho gene. Comparisons of AtRAC genes with rho genes, such as rhoA from C. elegans (EMBL accession no. AL031823), show that they have a clearly different gene structure (Figure 4) and they are also lacking the conserved splice site found between exons 3 and 4 in most rac and cdc42 genes.
Recently we have also characterized a partial RAC gene from the moss Physcomitrella patens, GenBank accession no. AF146341, which had an almost identical exon/intron structure as the AtRAC genes (P. Winge, R. Kristensen and A. M. Bones, unpublished results). This demonstrates that the genomic structure of the plant RAC genes has remained more or less unchanged since the divergence of vascular and nonvascular plants >400 million years ago.
Phylogenetic analysis: The plant RAC proteins were aligned with Rac and Cdc42 proteins from S. pombe, H. sapiens, C. elegans, D. melanogaster, Dictyostelium discoideum, and Entamoeba histolytica. Figure 5 shows a bootstrapped neighbor-joining tree created from this alignment, where an Arabidopsis Rho-like protein was selected as an outgroup. The phylogram revealed a major division between the Rac and Cdc42 proteins and it appears that the plant RAC proteins are a sister group to the Rac/Cdc42 proteins. The plant RAC proteins have the highest identity with human Rac1, up to 61% identity, and slightly lower identity with the Cdc42 proteins, up to 53% identical with S. pombe CDC42. Due to several unique amino acid substitutions and an N-terminal extension not found in other plant Rac proteins (Figure 1), the AtRAC9 protein is singled out into a separate group. A parsimony analysis of the same protein alignment using the Protpars program (Felsenstein 1984) produces a tree with a slightly different topology, but the major divisions are the same and they were also supported by significant bootstrap values (results not shown). Thus, the topology of the tree, whatever method used, suggests that the AtRAC proteins can be divided into distinct groups and that they have a deep branch linking them to the Rac/Cdc42 proteins. The placement of RAC group II near the base of the plant RAC branch is probably an artifact due to the rapid evolution of this group.
Single nucleotide polymorphisms (SNPs) found in nine AtRAC genes by comparison of A. thaliana Landsberg erecta and Columbia ecotypes
An overview of genomic sequences of the Rho GTPase gene family registered in GenBank
A DNA alignment was made, where the coding regions of 17 plant RAC expressed sequence tags (ESTs) were included together with 34 full-length plant RAC cDNAs registered in GenBank. All ESTs were >230 bp with an average size of 488 bp. Regions with low quality sequences were excluded and in a few sequences some minor changes had to be made to avoid gaps. The DNA alignment was analyzed with both parsimony and distance matrix methods. Figure 6 shows an unrooted neighbor-joining tree where the divisions of plant RAC genes into defined groups are evident. In this tree the RAC genes from mosses, conifers, and genes related to AtRAC2 appeared to form a clade that was more closely related to AtRAC9 than to the RAC genes within group II. This is in slight contrast to the results from the analysis of the plant RAC proteins shown in Figure 5, where AtRAC9 appeared to be more closely related to the RAC proteins from group II. A parsimony analysis of the same DNA alignment produced a tree with an almost identical topology, even though the bootstrap values for some clades are slightly lower (results not shown).
—Gene structure of rho GTPases. The exon and intron structure of the AtRAC group I and II genes is compared with the gene structure of various rho, rac, and cdc42 genes (Hs, Homo sapiens; Sp, Schizosaccharomyces pombe; Dd, Dictyostelium discoideum; Ce, Caenorhabditis elegans; Yl, Yarrowia lipolytica). The numbers between the exons indicate the splice phase of the intron (in phase zero the splicing occurs between codons). The sizes of introns are not drawn to scale. All genes are registered in GenBank under the given name in the figure. Sp rho*, GenBank accession no. Z97185.
Chromosomal location of the AtRAC genes: All the 11 AtRAC genes identified and characterized here have also recently been sequenced by the Arabidopsis Genome Initiative. The AtRAC genes are scattered throughout the A. thaliana genome and are located on all five chromosomes. There is no obvious clustering of the genes, but the AtRAC3 and AtRAC6 genes are separated by just 250 kb on chromosome IV. In Table 3, the genomic locations of the AtRAC genes are shown together with their nearest genetic markers. A comparison of the AtRAC gene neighbors revealed the dynamic nature of the Arabidopsis genome and showed that duplications, insertions, and deletions have been shaping the genome through evolution. Figure 7 shows the nearest gene neighbors to AtRAC1, -4-6, and -11, and clearly shows that these AtRAC genes were created through a number of large duplications. One of these duplications, encompassing 4.6 Mb of the middle part of chromosome II and the tip of the large arm of chromosome IV (Linet al. 1999; Mayeret al. 1999), resulted in the creation of AtRAC1 and AtRAC6. The high similarity between the AtRAC1 and AtRAC6 proteins suggests that this is a relatively recent duplication, but comparison of the noncoding regions of the AtRAC1 and AtRAC6 genes and the detection of several gene rearrangements suggest that the duplication must have occurred tens of millions of years ago. This is further supported by our studies of RAC genes from other plants within Brassicaceae (Cheiranthus cheiri, GenBank accession no. AF161017; Lepidium sativum, and others). These results indicated that all duplications leading to the AtRAC genes detected in Arabidopsis today must have taken place before the split between Arabidopsis and the other members of the mustard family, which is estimated to have occurred 10-35 million years ago (Lagercrantz 1998). Other interesting observations seen from Figure 7 are the recent inversion of the AtRAC1 gene and the loss of the VPS35-like genes next to AtRAC4 and AtRAC6. In addition, an ascorbate peroxidase and a histidine kinase have been inserted upstream and downstream of AtRAC6 and AtRAC1, respectively. Several genes encoding proteins involved in the regulation of the cytoskeleton and plant defense responses are found in close proximity to the AtRAC genes, but this could be a mere coincidence.
The two closely related genes, AtRAC4 and AtRAC5, are located in two large duplicated regions on chromosome I. A computer-based search shows that the duplicated regions extend from ∼24-39 cM [bacterial artificial chromosome (BAC) clone F9L1 → F21J9] to 114-131 cM (BAC clone F20P5 → F23A5). The low sequence homology of the noncoding regions in AtRAC4 and AtRAC5 and the many gene rearrangements show that this is an old duplication. As far as we know this is the first time this large duplication has been reported.
Even though the AtRAC3 gene is located close to AtRAC6 and the region flanking AtRAC3 is found duplicated on chromosome II, there is no indication that it has been duplicated together with the AtRAC1 and AtRAC6 genes, as no close homologues to AtRAC3 exist. This is also supported by results that show the genes flanking AtRAC3 have no homologous genes located near other AtRAC genes. This lack of synteny suggests that the AtRAC3 gene has been transposed to its current position after the large duplication, but a deletion of an allele on chromosome II cannot be excluded.
—A neighbor-joining tree of Rac proteins from various eukaryotes was created with the Clustal X program. An Arabidopsis Rho-like protein, AtRHO (GenBank accession no. U88402) was selected as an outgroup. To evaluate the confidence limits of the internal branches of the tree a bootstrap analysis with 1000 replications was performed on the data set. Bootstrap values >500 are shown to the right of each branch point. The scale bar indicates the number of amino acid substitutions per site. Three full-length EST clones are included. EST1, AI759954; EST2, AW039993; and EST3, AI937960. Abbreviations: Br, Brassica rapa; Bv, Beta vulgaris; Ca, Cicer arietinum; Ce, Caenorhabditis elegans; Dd, Dictyostelium discoideum; Dm, Drosophila melanogaster; Eh, Entamoeba histolytica; Gh, Gossypium hirsutum; Gm, Glycine max; Hs, Homo sapiens; Le, Lycopersicon esculentum; Lj, Lotus japonicus; Ms, Medicago sativa; Nt, Nicotiana tabacum; Os, Oryza sativa; Ph, Physcomitrella patens; Pm, Picea mariana; Ps, Pisum sativum; Sp, Schizosaccharomyces pombe; Zm, Zea mays.
The duplicated region that includes the AtRAC11 gene appears to be more confined and extends from ∼74 to 76 cM on chromosome III (BAC clone → T3A5 F26O13). This region, which spans ∼200 kb, is found duplicated on chromosomes II and IV together with the AtRAC1 and AtRAC6 genes. The AtRAC11 gene and the AtRAC5 gene on chromosome I represent duplication events prior to the AtRAC1 and AtRAC6 duplication. A computer-based search indicates that these old duplications probably span several hundred kilobases.
The relicts of an ancient duplication can also be found when scrutinizing the BAC clones with the AtRAC11 and AtRAC2 genes. Both these genes share a common neighbor, a homologue of the D. melanogaster crooked neck gene (Crn), which is located ∼60 kb upstream of AtRAC11 and 6 kb downstream of AtRAC2. So far these are the only Crn homologues found in Arabidopsis and they probably arose from the same duplication event. Located next to AtRAC2 (BAC clone K15I22) is a cluster of genes encoding anther-specific proline-rich proteins (APGs). A similar cluster of APG genes is found next to AtRAC4 and AtRAC5 (Figure 7), providing more support for this ancient gene duplication.
A study of the BAC clones flanking AtRAC8 and AtRAC10 genes reveals a duplication that spans >1 Mb on the long arms of chromosomes III and V. Currently available data indicate that the duplication covers the regions 65-73 cM from chromosome III (BAC F14D17 → T8P19) and 121-128 cM from chromosome V (BAC K19M22 → MBK5). As far as we know this apparently old duplication has not been reported previously. More than 40 genes are found duplicated in these regions and the actin genes ACT4 and ACT12 are among the duplicated genes.
—Phylogram of RAC genes from embryophytes. An unrooted neighbor-joining tree of RAC genes from various embryophytes was created with the Clustal X program. Expressed sequence tags from 17 genes were included in the data set; four of them are full-length genes. Some of the sequences were compiled from two or more ESTs and sequences of low quality were excluded. Branches with bootstrap values >500 are indicated as thick lines (1000 replications were analyzed). The scale bar indicates the number of nucleotide substitutions per site. Abbreviations: As, Populus tremuloides (Aspen); Br, Brassica rapa; Bv, Beta vulgaris; Ca, Cicer arietinum; Gh, Gossypium hirsutum; Gm, Glycine max; Le, Lycopersicon esculentum; Lj, Lotus japonicus; Ms, Medicago sativa; Nt, Nicotiana tabacum; Os, Oryza sativa; Ph, Physcomitrella patens; Pm, Picea mariana; Ps, Pisum sativum; Pt, Pinus taeda; Zm, Zea mays. EST1, AI730323; EST2, AI727570; EST3, AI731040; EST4, AI775563; EST5, D41104 and C26233; EST6, AW039993; EST7, AI937960; EST8, AW218480; EST9, AI759954; EST10, AW102025 and AI900160; EST11, AI162198; EST12, AU029919 and C73805; EST13, AI164960 and AI161509; EST14, AI901151; EST15, AW225989; EST16, AW056772; and EST17, AI812534.
DISCUSSION
After an extensive screen for RAC- and RHO-like genes in A. thaliana, we have identified a large family of RAC genes, AtRAC1 → AtRAC11. Given the large number of clones analyzed in this screen and that a previous cDNA screen identified 10 of the corresponding cDNAs (Wingeet al. 1997), it is likely that all the AtRAC genes in A. thaliana have now been found. Probes and hybridization conditions used in these screens were also carefully selected to include the more divergent AtRAC genes such as AtRAC2, -7, -9, and -10. Results from various EST projects shows that all RAC genes found in higher plants are highly similar to the Arabidopis RAC genes (Figures 5 and 6). Even though distinct subgroups of RAC genes exist in mosses and monocotyledons, there are no indications that other distantly related groups of RAC genes exist in embryophyta.
A new class of RAC genes identified in vascular plants: On the basis of their genomic structure and a number of distinct amino acid differences (Wingeet al. 1997), the 11 AtRAC genes can be divided into two distinct groups. The encoded proteins of group I AtRACs (AtRAC1-6, -9, and -11) have a C-terminal motif, CaaL (a, aliphatic amino acid), which indicates that they are geranylgeranylated (Traininet al. 1996). AtRAC genes belonging to group II have an additional exon at the 3′ end, which is most likely the result of the insertion of an intron in the extreme 3′ end of an ancestral RAC gene. One effect of the insertion of this intron is that the group II-encoded proteins lack the typical C-terminal CaaL motif but they have nevertheless retained a cysteine-containing motif, suggesting that they have a different C-terminal modification. The AtRAC7 protein, for instance, has a C-terminal motif (CTAA), which indicates it is farnesylated (Nambara and McCourt 1999). Comparison of the C-terminal parts of the AtRAC proteins suggests that an AtRAC3-like gene may have been the ancestor to the AtRAC genes belonging to RAC group II.
The genomic location of A. thaliana AtRAC1-11
Phylogenetic analysis shows that the members of the RAC group II can be divided into two defined subgroups, which are present in both monocotyledons and dicotyledons. This suggests that the creation of group II and the following duplication of this ancestral gene must have happened before the split between monocotyledonous and dicotyledonous plants that occurred ∼200 million years ago (Yanget al. 1999). Our analyses of the RAC genes from the moss P. patens and other bryophyta suggest that the group II RAC genes have evolved only in vascular plants (P. Winge, R. Kristensen and A. M. Bones, unpublished results). This indicates that the insertion of the intron, which resulted in the creation of group II RAC genes, must have occurred 200-400 million years ago.
—A physical map of a 50-kb region flanking the AtRAC1, -4-6, and -11 genes indicating numerous duplications, deletions, and rearrangements. The labels above the genes flanking AtRAC6 are registered BAC gene numbers. An inversion of the AtRAC1 gene can be observed. Three of the genes, AtRAC1, -5, and -11, have a VPS35 homologue as their closest neighbor, while in AtRAC4 and AtRAC6 it has been deleted. The structural similarities of the genes flanking AtRAC1 and AtRAC6 indicate that this is a recent gene duplication. The complete sequence of the BAC clones can be found in GenBank: AtRAC1, AF024504 and AC003952; AtRAC4, AC022472; AtRAC5, AC007396; AtRAC6, AL022373; and AtRAC11, AL132980.
The involvement of RAC proteins in the regulation of the actin cytoskeleton in eukaryotes is firmly established (Hall 1998). It is therefore interesting to note that the evolution and diversity of the AtRAC genes show many similarities with the Arabidopsis actin gene family (McDowellet al. 1996; Anet al. 1999; Meagheret al. 1999). They are both multigene families that diversified and split into two distinct groups approximately at the same time and both encode proteins that have distinct spatial and temporal expression patterns in vegetative and reproductive tissues (Delmeret al. 1995; Liet al. 1998; Kostet al. 1999; P. Winge, unpublished results). Four of the actin genes, ACT3, ACT4, ACT12, and T6D20.1 (an actin-like gene), have a chromosomal localization close to AtRAC genes. Whether there has been a coevolution of these two gene families remains an open question.
Organization and evolution of RAC genes in higher plants: Our results show that most of the AtRAC genes were created through large duplications encompassing several genes instead of being created through tandem duplications as is often seen in Arabidopsis. One of these recent duplications involving >4 Mb from chromosomes II and IV (Linet al. 1999; Mayeret al. 1999) resulted in the generation of the AtRAC1 and AtRAC6 genes. The AtRAC1 and AtRAC6 homologues may be restricted to the Brassicaceae family or possibly the Capparales order, but genes related to the AtRAC1, -6, and -11 subgroup are present in both rosids and asterids. This shows that the gene duplication, which resulted in the creation of this subgroup, happened before the split between asterids and rosids ∼90 million years ago.
We also report the findings of two other large duplications that have created new AtRAC homologues and report evidence for three additional ancient duplications. The duplicated regions on chromosome I, which includes the AtRAC4 and AtRAC5 genes, probably span 3 Mb or more. Our detection of an AtRAC4 homologue in L. sativum also shows that this duplication predates the split between Arabidopsis and other members of the Brassicaceae. The number and sizes of these ancient duplicated regions also raise the question of whether they are due to real duplication events or if the early ancestor was a tetraploid, as has also been suggested by others (Grantet al. 2000).
Phylogenetic analyses: The phylogenetic trees show that the AtRAC proteins/genes can be divided into several distinct groups that can be partly explained by the ancient gene duplications we report here. One interesting observation is that a large group of RAC homologues within group I that also include the genes AtRAC1, -3-6, and -11 appear to be restricted to dicotyledons (see Figure 6). This suggests that a number of RAC gene duplications have occurred in dicotyledons after the split with the monocotyledons. There is no indication that a similar gene expansion within this group has taken place in monocotyledons. On the other hand, additional RAC gene duplications within group II appear to have occurred in monocotyledons and the majority of RAC genes from monocotyledons and magnoliids registered in GenBank at this date belong to RAC group II.
The phylogenetic study also shows that the AtRAC2 gene is one of the most divergent members within RAC group I, and the existence of AtRAC2-like genes in conifers suggests that a similar gene existed well before the development of flowering plants. This is also supported by our studies of AtRAC gene duplications, which show that an AtRAC2-like gene probably was a progenitor to the AtRAC5 and AtRAC11 genes. Furthermore, our expression analysis of AtRAC2 shows that it is primarily expressed in roots and vascular tissues (Wingeet al. 1997), an expression pattern it may have inherited from a functionally developed AtRAC2 gene that existed in early nonflowering plants.
AtRAC polymorphisms: Comparison of the AtRAC genes sequenced from the Columbia and Landsberg erecta ecotypes revealed large differences in the number of polymorphisms and indels. For instance, AtRAC2 had approximately one SNP every 50 bp, which is five times higher than the average. Several factors are known to influence the mutation rates of individual genes. From both eukaryotic and prokaryotic organisms it has been shown that expression levels influence DNA repair rates, through so-called transcription-coupled DNA repair (Hanawalt 1989). Compared with the other AtRAC genes, AtRAC2 has a distinct expression pattern and has a relatively low expression (Wingeet al. 1997). The high number of polymorphisms found in AtRAC2 can therefore be due to its low expression and thereby lack of transcription-coupled DNA repair, which in turn results in the observed accumulation of SNPs.
Evolutionary considerations: Phylogenetic comparison of the AtRAC proteins with Rho family members from other eukaryotes suggests that the RAC proteins in plants developed into a distinct group at an early stage during eukaryotic evolution. An alternative explanation is that the RAC proteins in higher plants have gone through a rapid evolution at a later stage. Support for this latter view comes from studies of algae within the paraphyletic Prasinophyte lineage, which show that their RAC proteins are more similar to RAC proteins found in animals and amoebas (P. Winge, T. Brembu and A. M. Bones, unpublished results). Furthermore, the studies of RAC proteins from bryophyta show that they have remained virtually unchanged in land plants. This suggests that RAC proteins in early plants underwent a rapid evolution before terrestrial plants appeared 480 million years ago (Kenrick and Crane 1997). Whether this occurred when the first multicellular plants evolved or if the transition took place even earlier is still unresolved.
The existence of Rac and Rho proteins in eukaryotes is universal. In vertebrates and invertebrates all subgroups of the Rho family (Rac, Rho, and Cdc42) are present. In contrast some of the lower eukaryotes seem to lack certain subgroups of the Rho family. For instance, “true” Rac proteins do not exist in S. cerevisiae and S. pombe, but are found in the dimorphic yeast Yarrowia lipolytica (Hurtadoet al. 2000), while the amoebas and slime molds appear to lack the Cdc42 proteins. The absence of Cdc42-like proteins in amoebas, slime molds, and higher plants could therefore indicate that the Cdc42 group evolved later during evolution, but it cannot be ruled out that they have been selectively lost in some of these kingdoms at a later stage. Because RAC proteins in embryophyta appear to have diverged and evolved faster than most Rac/Rho proteins from other organisms, the phylogenetic analysis may give the impression that they have evolved early and even predate the Rac/Cdc42 proteins. However, the most likely scenario is that the RAC proteins in embryophyta have evolved from a RAC-like ancestor.
Conservation of gene structure and specifically the conservation of splice site positions are additional factors that can be used to deduce the phylogenetic relationship between genes (Kloeket al. 1993; de Souzaet al. 1998). Comparison of the AtRAC gene structure with rac and rho genes from other eukaryotes shows that AtRAC genes are more closely related to rac/cdc42 genes than to rho genes, the exception being human rho7 and a rho-like gene from S. pombe (GenBank accession no. Z97185).
The evolution of the RAC proteins in higher plants leaves some tantalizing questions. Why did the multicellular plants evolve such a distinct group of RAC proteins in the first place and what type of selection pressure could have been involved? One clue to these questions may come from our and others’ observations that higher plants do not have true Ras proteins (Wingeet al. 1997; Meyerowitz 1999). Ras and Rap-like proteins have been found in various lower eukaryotes, including Trypanosoma brucei, E. histolytica, Dictyostelium discoideum, and various fungi (Reymondet al. 1984; Lohia and Samuelson 1996; Sowaet al. 1999). The evolution of specific types of RAC proteins in higher plants may have been an adaptation to the loss of Ras proteins, such that the RAC proteins in plants now have dual functions. This may also explain why there has been a selection pressure to evolve a RAC multigene family in higher plants. If the RAC proteins in embryophyta have both Rac and Ras-like functions, then these proteins may play the role of master regulators in plants.
Acknowledgments
This work was supported by The Norwegian Research Council grant 100370/410.
Footnotes
-
Communicating editor: C. S. Gasser
- Received February 29, 2000.
- Accepted August 4, 2000.
- Copyright © 2000 by the Genetics Society of America